[jira] [Commented] (AIRFLOW-2363) S3 remote logging appending tuple instead of str
[ https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449156#comment-16449156 ] Kevin Yang commented on AIRFLOW-2363: - Hi [~hamlinkn], I've created a PR to fix it but I think I didn't tag you correctly on github [https://github.com/apache/incubator-airflow/pull/3259] Do you think it is possible for you to test it in your infra end to end? It might take some extra work to have S3 set up on my side and I think you might benefit from a faster merge. Thank you! > S3 remote logging appending tuple instead of str > > > Key: AIRFLOW-2363 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2363 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Reporter: Kyle Hamlin >Assignee: Kevin Yang >Priority: Major > Fix For: 1.10.0 > > > A recent merge into master that added support for Elasticsearch logging seems > to have broken S3 logging by returning a tuple instead of a string. > [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128] > > following errors thrown: > > *Session NoneType error* > Traceback (most recent call last): > File > "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", > line 171, in s3_write > encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'), > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 274, in load_string > encrypt=encrypt) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 313, in load_bytes > client = self.get_conn() > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 34, in get_conn > return self.get_client_type('s3') > File > "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", > line 151, in get_client_type > session, endpoint_url = self._get_credentials(region_name) > File > "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", > line 97, in _get_credentials > connection_object = self.get_connection(self.aws_conn_id) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", > line 82, in get_connection > conn = random.choice(cls.get_connections(conn_id)) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", > line 77, in get_connections > conns = cls._get_connections_from_db(conn_id) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 72, in wrapper > with create_session() as session: > File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__ > return next(self.gen) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 41, in create_session > session = settings.Session() > TypeError: 'NoneType' object is not callable > > *TypeError must be str not tuple* > [2018-04-16 18:37:28,200] ERROR in app: Exception on > /admin/airflow/get_logs_with_metadata [GET] > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, > in reraise > raise value > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line > 69, in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line > 368, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in > decorated_view > return func(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line > 269, in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 74, in wrapper > return func(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line > 770, in get_logs_with_metadata > logs, metadatas =
[jira] [Updated] (AIRFLOW-2368) Relative path should be used when enqueuing task instace
[ https://issues.apache.org/jira/browse/AIRFLOW-2368?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Xiao Zhu updated AIRFLOW-2368: -- Description: In our Airflow setup, scheduler and workers (connected via Celery) runs on different machines and have different root dirs. Currently in airflow when scheduler enqueues a task instance it uses the full absolute path of the DAG file, while the DAG is located at a different location on our workers (because of different root dirs). As a result workers can't find the DAG file. When scheduler enqueues a task instances it should use its relative path (relative to DAGS_FOLDER). was: In our Airflow setup, scheduler and workers (connected via Celery) runs on different machine and have different root dir. Currently in airflow when scheduler enqueues a task instance it uses the full absolute path of the DAG file, while the DAG is located at a different location on our workers (because of different root dir). As a result workers can't find the DAG file. When scheduler enqueues a task instances it should use its relative path (relative to DAGS_FOLDER). > Relative path should be used when enqueuing task instace > > > Key: AIRFLOW-2368 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2368 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Xiao Zhu >Priority: Major > > In our Airflow setup, scheduler and workers (connected via Celery) runs on > different machines and have different root dirs. Currently in airflow when > scheduler enqueues a task instance it uses the full absolute path of the DAG > file, while the DAG is located at a different location on our workers > (because of different root dirs). As a result workers can't find the DAG file. > When scheduler enqueues a task instances it should use its relative path > (relative to DAGS_FOLDER). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2340) SQLalchemy pessimistic connection handling not working
[ https://issues.apache.org/jira/browse/AIRFLOW-2340?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16449053#comment-16449053 ] Xiao Zhu commented on AIRFLOW-2340: --- We see that with our mysql db too > SQLalchemy pessimistic connection handling not working > -- > > Key: AIRFLOW-2340 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2340 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 >Reporter: John Arnold >Priority: Critical > Attachments: airflow_traceback.txt, webserver.txt > > > Our scheduler keeps crashing, about once a day. It seems to be triggered by > a failure to connect to the postgresql database, but then it doesn't recover > and crashes the scheduler over and over. > The scheduler runs in a container in our environment, so after several > container restarts, docker gives up and the container stays down. > Airflow should be able to recover from a connection failure without blowing > up the container altogether. Perhaps some exponential backoff is needed? > > See attached log from the scheduler. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2368) Relative path should be used when enqueuing task instace
Xiao Zhu created AIRFLOW-2368: - Summary: Relative path should be used when enqueuing task instace Key: AIRFLOW-2368 URL: https://issues.apache.org/jira/browse/AIRFLOW-2368 Project: Apache Airflow Issue Type: Improvement Reporter: Xiao Zhu In our Airflow setup, scheduler and workers (connected via Celery) runs on different machine and have different root dir. Currently in airflow when scheduler enqueues a task instance it uses the full absolute path of the DAG file, while the DAG is located at a different location on our workers (because of different root dir). As a result workers can't find the DAG file. When scheduler enqueues a task instances it should use its relative path (relative to DAGS_FOLDER). -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2363) S3 remote logging appending tuple instead of str
[ https://issues.apache.org/jira/browse/AIRFLOW-2363?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Yang reassigned AIRFLOW-2363: --- Assignee: Kevin Yang > S3 remote logging appending tuple instead of str > > > Key: AIRFLOW-2363 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2363 > Project: Apache Airflow > Issue Type: Bug > Components: logging >Reporter: Kyle Hamlin >Assignee: Kevin Yang >Priority: Major > Fix For: 1.10.0 > > > A recent merge into master that added support for Elasticsearch logging seems > to have broken S3 logging by returning a tuple instead of a string. > [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128] > > following errors thrown: > > *Session NoneType error* > Traceback (most recent call last): > File > "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", > line 171, in s3_write > encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'), > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 274, in load_string > encrypt=encrypt) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 313, in load_bytes > client = self.get_conn() > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", > line 34, in get_conn > return self.get_client_type('s3') > File > "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", > line 151, in get_client_type > session, endpoint_url = self._get_credentials(region_name) > File > "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", > line 97, in _get_credentials > connection_object = self.get_connection(self.aws_conn_id) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", > line 82, in get_connection > conn = random.choice(cls.get_connections(conn_id)) > File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", > line 77, in get_connections > conns = cls._get_connections_from_db(conn_id) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 72, in wrapper > with create_session() as session: > File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__ > return next(self.gen) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 41, in create_session > session = settings.Session() > TypeError: 'NoneType' object is not callable > > *TypeError must be str not tuple* > [2018-04-16 18:37:28,200] ERROR in app: Exception on > /admin/airflow/get_logs_with_metadata [GET] > Traceback (most recent call last): > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, > in reraise > raise value > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line > 69, in inner > return self._run_view(f, *args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line > 368, in _run_view > return fn(self, *args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in > decorated_view > return func(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line > 269, in wrapper > return f(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line > 74, in wrapper > return func(*args, **kwargs) > File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line > 770, in get_logs_with_metadata > logs, metadatas = handler.read(ti, try_number, metadata=metadata) > File > "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py", > line 165, in read > logs[i] += log > TypeError: must be str, not tuple -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448980#comment-16448980 ] John Arnold commented on AIRFLOW-2367: -- Manual repro of the top query only takes 250ms... > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448978#comment-16448978 ] John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:34 PM: I believe the "top talker" query is from this query in models.py: @provide_session def get_task_instances(self, state=None, session=None): """ Returns the task instances for this dag run """ TI = TaskInstance tis = session.query(TI).filter( TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, ) was (Author: johnarnold): I believe the "top talker" query is from this query in models.py: @provide_session def get_task_instances(self, state=None, session=None): """ Returns the task instances for this dag run """ TI = TaskInstance tis = session.query(TI).filter( TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, ) > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448978#comment-16448978 ] John Arnold commented on AIRFLOW-2367: -- I believe the "top talker" query is from this query in models.py: @provide_session def get_task_instances(self, state=None, session=None): """ Returns the task instances for this dag run """ TI = TaskInstance tis = session.query(TI).filter( TI.dag_id == self.dag_id, TI.execution_date == self.execution_date, ) > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448974#comment-16448974 ] John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:31 PM: This insert is taking the second-most time (about 15% of total time): INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id I verified there are no indexes, triggers or weird constraints that would make it slow, it just has a high volume. was (Author: johnarnold): This insert is taking the second-most time (about 1/3 of the above): INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id I verified there are no indexes, triggers or weird constraints that would make it slow, it just has a high volume. > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448969#comment-16448969 ] John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:30 PM: This query is taking the most time (about 50% of total time in the top 10 queries): SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid, task_instance.uuid AS task_instance_uuid FROM task_instance WHERE task_instance.dag_id = ? AND task_instance.execution_date = ?::timestamptz was (Author: johnarnold): This query is taking the most time: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid, task_instance.uuid AS task_instance_uuid FROM task_instance WHERE task_instance.dag_id = ? AND task_instance.execution_date = ?::timestamptz > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448974#comment-16448974 ] John Arnold commented on AIRFLOW-2367: -- This insert is taking the second-most time (about 1/3 of the above): INSERT INTO log (dttm, dag_id, task_id, event, execution_date, owner, extra) VALUES (?::timestamptz, ?, ?, ?, ?::timestamptz, ?, ?) RETURNING log.id I verified there are no indexes, triggers or weird constraints that would make it slow, it just has a high volume. > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448969#comment-16448969 ] John Arnold commented on AIRFLOW-2367: -- This query is taking the most time: SELECT task_instance.try_number AS task_instance_try_number, task_instance.task_id AS task_instance_task_id, task_instance.dag_id AS task_instance_dag_id, task_instance.execution_date AS task_instance_execution_date, task_instance.start_date AS task_instance_start_date, task_instance.end_date AS task_instance_end_date, task_instance.duration AS task_instance_duration, task_instance.state AS task_instance_state, task_instance.max_tries AS task_instance_max_tries, task_instance.hostname AS task_instance_hostname, task_instance.unixname AS task_instance_unixname, task_instance.job_id AS task_instance_job_id, task_instance.pool AS task_instance_pool, task_instance.queue AS task_instance_queue, task_instance.priority_weight AS task_instance_priority_weight, task_instance.operator AS task_instance_operator, task_instance.queued_dttm AS task_instance_queued_dttm, task_instance.pid AS task_instance_pid, task_instance.uuid AS task_instance_uuid FROM task_instance WHERE task_instance.dag_id = ? AND task_instance.execution_date = ?::timestamptz > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448953#comment-16448953 ] John Arnold edited comment on AIRFLOW-2367 at 4/23/18 10:18 PM: [~bolke] Any suggestions on what metrics or configuration options? We've been looking over the database (top 10 queries etc) and there are no surprises that I can see. The top query by far is for task_instance table and all the conditionals are for indexed columns. I went through basically every query in models.py looking for any that are using unindexed columns, and didn't find any. I've attached a screenshot of the top 10 queries. We played with our connection pool sizes, thinking that perhaps we were hammering the db with connections, but that didn't seem to make any difference. We have the scheduler set with a connection pool of 20, two instances of the webserver with connection pool = 5, and all the celery workers have connection pool = 1. was (Author: johnarnold): [~bolke] Any suggestions on what metrics or configuration options? We've been looking over the database (top 10 queries etc) and there are no surprises that I can see. The top query by far is for task_instance table and all the conditionals are for indexed columns. I went through basically every query in models.py looking for any that are using unindexed columns, and didn't find any. I've attached a screenshot of the top 10 queries. We played with our connection pool sizes, thinking that perhaps we were hammering the db with connections, but that didn't seem to make any difference. > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448953#comment-16448953 ] John Arnold commented on AIRFLOW-2367: -- [~bolke] Any suggestions on what metrics or configuration options? We've been looking over the database (top 10 queries etc) and there are no surprises that I can see. The top query by far is for task_instance table and all the conditionals are for indexed columns. I went through basically every query in models.py looking for any that are using unindexed columns, and didn't find any. I've attached a screenshot of the top 10 queries. We played with our connection pool sizes, thinking that perhaps we were hammering the db with connections, but that didn't seem to make any difference. > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Arnold updated AIRFLOW-2367: - Attachment: postgres.png > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png, postgres.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448812#comment-16448812 ] Bolke de Bruin commented on AIRFLOW-2367: - You really need to provide more metrics and configuration options. The scheduler can be really busy when you have a lot of dags. worker memory also affects your performance. In other words let a DBA have a look at what you are doing and report back. > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command
[ https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2068. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3008 [https://github.com/apache/incubator-airflow/pull/3008] > Mesos Executor should allow specifying optional Docker image before running > command > --- > > Key: AIRFLOW-2068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2068 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Agraj Mangal >Assignee: Agraj Mangal >Priority: Major > Fix For: 1.10.0 > > > In its current form, MesosExecutor schedules tasks on mesos slaves which just > contain airflow commands assuming that the mesos slaves already have airflow > installed and configured on them. This assumption goes against the Mesos > philosophy of having a heterogeneous cluster. > Since mesos provides an option to pull a docker image before actually running > the actual task/command so this improvement changes the mesos_executor.py to > specify an optional docker image containing airflow which can be pulled on > slaves before running the actual airflow command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command
[ https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448806#comment-16448806 ] ASF subversion and git services commented on AIRFLOW-2068: -- Commit d96534c268a13f8e6a0e1e288821a436322926c4 in incubator-airflow's branch refs/heads/v1-10-test from [~amangal] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=d96534c ] [AIRFLOW-2068] MesosExecutor allows optional Docker image In its current form, MesosExecutor schedules tasks on mesos slaves which just contain airflow commands assuming that the mesos slaves already have airflow installed and configured on them. This assumption goes against the Mesos philosophy of having a heterogeneous cluster. Since Mesos provides an option to pull a Docker image before actually running the actual task/command so this improvement changes the mesos_executor.py to specify an optional docker image containing airflow which can be pulled on slaves before running the actual airflow command. This also opens the door for an optimization of resources in a future PR, by allowing the specification of CPU and memory needed for each airflow task. Closes #3008 from agrajm/AIRFLOW-2068 (cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc) Signed-off-by: Bolke de Bruin> Mesos Executor should allow specifying optional Docker image before running > command > --- > > Key: AIRFLOW-2068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2068 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Agraj Mangal >Assignee: Agraj Mangal >Priority: Major > Fix For: 1.10.0 > > > In its current form, MesosExecutor schedules tasks on mesos slaves which just > contain airflow commands assuming that the mesos slaves already have airflow > installed and configured on them. This assumption goes against the Mesos > philosophy of having a heterogeneous cluster. > Since mesos provides an option to pull a docker image before actually running > the actual task/command so this improvement changes the mesos_executor.py to > specify an optional docker image containing airflow which can be pulled on > slaves before running the actual airflow command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command
[ https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448805#comment-16448805 ] ASF subversion and git services commented on AIRFLOW-2068: -- Commit d96534c268a13f8e6a0e1e288821a436322926c4 in incubator-airflow's branch refs/heads/v1-10-test from [~amangal] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=d96534c ] [AIRFLOW-2068] MesosExecutor allows optional Docker image In its current form, MesosExecutor schedules tasks on mesos slaves which just contain airflow commands assuming that the mesos slaves already have airflow installed and configured on them. This assumption goes against the Mesos philosophy of having a heterogeneous cluster. Since Mesos provides an option to pull a Docker image before actually running the actual task/command so this improvement changes the mesos_executor.py to specify an optional docker image containing airflow which can be pulled on slaves before running the actual airflow command. This also opens the door for an optimization of resources in a future PR, by allowing the specification of CPU and memory needed for each airflow task. Closes #3008 from agrajm/AIRFLOW-2068 (cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc) Signed-off-by: Bolke de Bruin> Mesos Executor should allow specifying optional Docker image before running > command > --- > > Key: AIRFLOW-2068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2068 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Agraj Mangal >Assignee: Agraj Mangal >Priority: Major > Fix For: 1.10.0 > > > In its current form, MesosExecutor schedules tasks on mesos slaves which just > contain airflow commands assuming that the mesos slaves already have airflow > installed and configured on them. This assumption goes against the Mesos > philosophy of having a heterogeneous cluster. > Since mesos provides an option to pull a docker image before actually running > the actual task/command so this improvement changes the mesos_executor.py to > specify an optional docker image containing airflow which can be pulled on > slaves before running the actual airflow command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command
[ https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448804#comment-16448804 ] ASF subversion and git services commented on AIRFLOW-2068: -- Commit 1f86299cf998729ea93c666481390451c9724ebc in incubator-airflow's branch refs/heads/master from [~amangal] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1f86299 ] [AIRFLOW-2068] MesosExecutor allows optional Docker image In its current form, MesosExecutor schedules tasks on mesos slaves which just contain airflow commands assuming that the mesos slaves already have airflow installed and configured on them. This assumption goes against the Mesos philosophy of having a heterogeneous cluster. Since Mesos provides an option to pull a Docker image before actually running the actual task/command so this improvement changes the mesos_executor.py to specify an optional docker image containing airflow which can be pulled on slaves before running the actual airflow command. This also opens the door for an optimization of resources in a future PR, by allowing the specification of CPU and memory needed for each airflow task. Closes #3008 from agrajm/AIRFLOW-2068 > Mesos Executor should allow specifying optional Docker image before running > command > --- > > Key: AIRFLOW-2068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2068 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Agraj Mangal >Assignee: Agraj Mangal >Priority: Major > > In its current form, MesosExecutor schedules tasks on mesos slaves which just > contain airflow commands assuming that the mesos slaves already have airflow > installed and configured on them. This assumption goes against the Mesos > philosophy of having a heterogeneous cluster. > Since mesos provides an option to pull a docker image before actually running > the actual task/command so this improvement changes the mesos_executor.py to > specify an optional docker image containing airflow which can be pulled on > slaves before running the actual airflow command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2068] MesosExecutor allows optional Docker image
Repository: incubator-airflow Updated Branches: refs/heads/v1-10-test 64900e2b0 -> d96534c26 [AIRFLOW-2068] MesosExecutor allows optional Docker image In its current form, MesosExecutor schedules tasks on mesos slaves which just contain airflow commands assuming that the mesos slaves already have airflow installed and configured on them. This assumption goes against the Mesos philosophy of having a heterogeneous cluster. Since Mesos provides an option to pull a Docker image before actually running the actual task/command so this improvement changes the mesos_executor.py to specify an optional docker image containing airflow which can be pulled on slaves before running the actual airflow command. This also opens the door for an optimization of resources in a future PR, by allowing the specification of CPU and memory needed for each airflow task. Closes #3008 from agrajm/AIRFLOW-2068 (cherry picked from commit 1f86299cf998729ea93c666481390451c9724ebc) Signed-off-by: Bolke de BruinProject: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/d96534c2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/d96534c2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/d96534c2 Branch: refs/heads/v1-10-test Commit: d96534c268a13f8e6a0e1e288821a436322926c4 Parents: 64900e2 Author: Agraj Mangal Authored: Mon Apr 23 22:22:29 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 22:23:06 2018 +0200 -- airflow/config_templates/default_airflow.cfg | 4 + airflow/config_templates/default_test.cfg | 9 ++ airflow/contrib/executors/__init__.py | 6 +- airflow/contrib/executors/mesos_executor.py| 19 docs/configuration.rst | 39 +++- tests/contrib/executors/__init__.py| 23 +++-- tests/contrib/executors/test_mesos_executor.py | 105 7 files changed, 188 insertions(+), 17 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/config_templates/default_airflow.cfg -- diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 845342c..fa5eea0 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -479,6 +479,10 @@ authenticate = False # default_principal = admin # default_secret = admin +# Optional Docker Image to run on slave before running the command +# This image should be accessible from mesos slave i.e mesos slave +# should be able to pull this docker image before executing the command. +# docker_image_slave = puckel/docker-airflow [kerberos] ccache = /tmp/airflow_krb5_ccache http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/config_templates/default_test.cfg -- diff --git a/airflow/config_templates/default_test.cfg b/airflow/config_templates/default_test.cfg index 4145dfd..7c569cd 100644 --- a/airflow/config_templates/default_test.cfg +++ b/airflow/config_templates/default_test.cfg @@ -91,6 +91,15 @@ flower_host = 0.0.0.0 flower_port = default_queue = default +[mesos] +master = localhost:5050 +framework_name = Airflow +task_cpu = 1 +task_memory = 256 +checkpoint = False +authenticate = False +docker_image_slave = test/docker-airflow + [scheduler] job_heartbeat_sec = 1 scheduler_heartbeat_sec = 5 http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/contrib/executors/__init__.py -- diff --git a/airflow/contrib/executors/__init__.py b/airflow/contrib/executors/__init__.py index f0f8b68..b7f8352 100644 --- a/airflow/contrib/executors/__init__.py +++ b/airflow/contrib/executors/__init__.py @@ -7,13 +7,13 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY # KIND, either express or implied. See the License for the # specific language governing permissions and limitations # under the License. - +# http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/d96534c2/airflow/contrib/executors/mesos_executor.py -- diff --git
[jira] [Commented] (AIRFLOW-2068) Mesos Executor should allow specifying optional Docker image before running command
[ https://issues.apache.org/jira/browse/AIRFLOW-2068?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448803#comment-16448803 ] ASF subversion and git services commented on AIRFLOW-2068: -- Commit 1f86299cf998729ea93c666481390451c9724ebc in incubator-airflow's branch refs/heads/master from [~amangal] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1f86299 ] [AIRFLOW-2068] MesosExecutor allows optional Docker image In its current form, MesosExecutor schedules tasks on mesos slaves which just contain airflow commands assuming that the mesos slaves already have airflow installed and configured on them. This assumption goes against the Mesos philosophy of having a heterogeneous cluster. Since Mesos provides an option to pull a Docker image before actually running the actual task/command so this improvement changes the mesos_executor.py to specify an optional docker image containing airflow which can be pulled on slaves before running the actual airflow command. This also opens the door for an optimization of resources in a future PR, by allowing the specification of CPU and memory needed for each airflow task. Closes #3008 from agrajm/AIRFLOW-2068 > Mesos Executor should allow specifying optional Docker image before running > command > --- > > Key: AIRFLOW-2068 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2068 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Agraj Mangal >Assignee: Agraj Mangal >Priority: Major > > In its current form, MesosExecutor schedules tasks on mesos slaves which just > contain airflow commands assuming that the mesos slaves already have airflow > installed and configured on them. This assumption goes against the Mesos > philosophy of having a heterogeneous cluster. > Since mesos provides an option to pull a docker image before actually running > the actual task/command so this improvement changes the mesos_executor.py to > specify an optional docker image containing airflow which can be pulled on > slaves before running the actual airflow command. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working
[ https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448696#comment-16448696 ] Ramki Subramanian commented on AIRFLOW-2366: [~johnarnold] - I am not able to view any attachment. > New Webserver UI with Role-Based Access Control not working > --- > > Key: AIRFLOW-2366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2366 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Affects Versions: 1.9.0 >Reporter: Ramki Subramanian >Priority: Major > > I tried setting up the Role-Based Access Control as given in > [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] > I had changed my airflow.cfg file as given below > {quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under > [webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}} > {quote} > But am not sure which airflow command to use to generate webserver_config.py > {quote}```and then run {{airflow}} command, which will generate the > {{webserver_config.py}} file in your $AIRFLOW_HOME.``` > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working
[ https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Ramki Subramanian updated AIRFLOW-2366: --- Description: I tried setting up the Role-Based Access Control as given in [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] I had changed my airflow.cfg file as given below {quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under [webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}} {quote} But am not sure which airflow command to use to generate webserver_config.py {quote}```and then run {{airflow}} command, which will generate the {{webserver_config.py}} file in your $AIRFLOW_HOME.``` {quote} was: I tried setting up the Role-Based Access Control as given in [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] I had changed my airflow.cfg file as given below ```To turn on this feature, in your airflow.cfg file, under [webserver], set configuration variable {{rbac = True}},``` But am not sure which airflow command to use to generate webserver_config.py ```and then run {{airflow}} command, which will generate the {{webserver_config.py}} file in your $AIRFLOW_HOME.``` > New Webserver UI with Role-Based Access Control not working > --- > > Key: AIRFLOW-2366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2366 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Affects Versions: 1.9.0 >Reporter: Ramki Subramanian >Priority: Major > > I tried setting up the Role-Based Access Control as given in > [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] > I had changed my airflow.cfg file as given below > {quote}{{```}}{{To turn on this feature, in your airflow.cfg file, under > [webserver], set configuration variable }}{{rbac = True}}{{,}}{{```}} > {quote} > But am not sure which airflow command to use to generate webserver_config.py > {quote}```and then run {{airflow}} command, which will generate the > {{webserver_config.py}} file in your $AIRFLOW_HOME.``` > {quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2367) High POSTGRES DB CPU utilization
[ https://issues.apache.org/jira/browse/AIRFLOW-2367?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Arnold updated AIRFLOW-2367: - Attachment: cpu.png > High POSTGRES DB CPU utilization > > > Key: AIRFLOW-2367 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: Airflow 2.0, 1.9.0 >Reporter: John Arnold >Priority: Major > Attachments: cpu.png > > > We are seeing steady state 70-90% CPU utilization. It feels like a missing > index kind of problem, as our TPS rate is really low, I'm not seeing any long > running queries, connection counts are reasonable (low hundreds) and locks > also look reasonable (not many exclusive / write locks) > We shut down the webserver and it doesn't go away, so it doesn't seem to be > in that part of the code. My guess is either the scheduler has an inefficient > query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working
[ https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Arnold updated AIRFLOW-2366: - Attachment: (was: cpu.png) > New Webserver UI with Role-Based Access Control not working > --- > > Key: AIRFLOW-2366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2366 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Affects Versions: 1.9.0 >Reporter: Ramki Subramanian >Priority: Major > > I tried setting up the Role-Based Access Control as given in > [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] > I had changed my airflow.cfg file as given below > ```To turn on this feature, in your airflow.cfg file, under [webserver], set > configuration variable {{rbac = True}},``` > But am not sure which airflow command to use to generate webserver_config.py > ```and then run {{airflow}} command, which will generate the > {{webserver_config.py}} file in your $AIRFLOW_HOME.``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working
[ https://issues.apache.org/jira/browse/AIRFLOW-2366?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] John Arnold updated AIRFLOW-2366: - Attachment: cpu.png > New Webserver UI with Role-Based Access Control not working > --- > > Key: AIRFLOW-2366 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2366 > Project: Apache Airflow > Issue Type: Bug > Components: authentication >Affects Versions: 1.9.0 >Reporter: Ramki Subramanian >Priority: Major > Attachments: cpu.png > > > I tried setting up the Role-Based Access Control as given in > [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] > I had changed my airflow.cfg file as given below > ```To turn on this feature, in your airflow.cfg file, under [webserver], set > configuration variable {{rbac = True}},``` > But am not sure which airflow command to use to generate webserver_config.py > ```and then run {{airflow}} command, which will generate the > {{webserver_config.py}} file in your $AIRFLOW_HOME.``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2367) High POSTGRES DB CPU utilization
John Arnold created AIRFLOW-2367: Summary: High POSTGRES DB CPU utilization Key: AIRFLOW-2367 URL: https://issues.apache.org/jira/browse/AIRFLOW-2367 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: Airflow 2.0, 1.9.0 Reporter: John Arnold We are seeing steady state 70-90% CPU utilization. It feels like a missing index kind of problem, as our TPS rate is really low, I'm not seeing any long running queries, connection counts are reasonable (low hundreds) and locks also look reasonable (not many exclusive / write locks) We shut down the webserver and it doesn't go away, so it doesn't seem to be in that part of the code. My guess is either the scheduler has an inefficient query, or the (Celery) executor code path does. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2365) Fix autocommit test issue with SQLite
[ https://issues.apache.org/jira/browse/AIRFLOW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer reassigned AIRFLOW-2365: --- Assignee: Arthur Wiedmer > Fix autocommit test issue with SQLite > - > > Key: AIRFLOW-2365 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2365 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Major > > In a previous PR, I added acheck for an autocommit attribute which fails for > SQLite. Correcting the tests now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2365) Fix autocommit test issue with SQLite
Arthur Wiedmer created AIRFLOW-2365: --- Summary: Fix autocommit test issue with SQLite Key: AIRFLOW-2365 URL: https://issues.apache.org/jira/browse/AIRFLOW-2365 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer In a previous PR, I added acheck for an autocommit attribute which fails for SQLite. Correcting the tests now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2366) New Webserver UI with Role-Based Access Control not working
Ramki Subramanian created AIRFLOW-2366: -- Summary: New Webserver UI with Role-Based Access Control not working Key: AIRFLOW-2366 URL: https://issues.apache.org/jira/browse/AIRFLOW-2366 Project: Apache Airflow Issue Type: Bug Components: authentication Affects Versions: 1.9.0 Reporter: Ramki Subramanian I tried setting up the Role-Based Access Control as given in [https://github.com/apache/incubator-airflow/blob/master/UPDATING.md] I had changed my airflow.cfg file as given below ```To turn on this feature, in your airflow.cfg file, under [webserver], set configuration variable {{rbac = True}},``` But am not sure which airflow command to use to generate webserver_config.py ```and then run {{airflow}} command, which will generate the {{webserver_config.py}} file in your $AIRFLOW_HOME.``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
svn commit: r26473 - /dev/incubator/airflow/1.10.0beta1/
Author: bolke Date: Mon Apr 23 17:59:26 2018 New Revision: 26473 Log: Add 1.10.0 beta 1 Added: dev/incubator/airflow/1.10.0beta1/ dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz (with props) dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512 dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz (with props) dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.md5 dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.sha512 Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz == Binary file - no diff available. Propchange: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz -- svn:mime-type = application/octet-stream Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc == --- dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc (added) +++ dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.asc Mon Apr 23 17:59:26 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEB1hcT6CMC2tdrJKNNRkLg9kFoLoFAlreHa0ACgkQNRkLg9kF +oLpfdBAAoT9YbPWnSz9PamAL+fo/no2tR/tu8NOeM/5em5LruhSCNQuHK90R3kYt +jhl0s1AKzgyy4d5yELn9DReEM/VUFbx2ROrHjM9HPud4gd30MzHsiAoMIheIaakO +dDPA62PrK9+Qa3CDWOYVseqkMdmrvo5GrmJDf7M4QU+HqD0mBvf1JKD/Dc779u7k +5yqi2FgufHRlj8yqwGaWLtiZnprHXmQcoSictG/2PEIW4HHOkU4/L1/QKt6y7yfF +DZy+/rnGNTzncFBZd7cnrysjDYY45mUddkrSnxl/4Td0KiiLAITmbhI+RANpvHq7 +5iyOAQ8VqDLgWkYlMRRhyH+Aik+jQNz2+iDRWt0K1w3rAcN9Ta5hPk0tF8wIh3I7 +Yb82+uZKExyV2XhMQvtCrDWpE7tsiXApDSyGCnIghXmYWjAB9yH1LyyjLhjyZFGU +lPhLpH2ycxcZxmX6n6E3BCccr+moHAZeoKOOg89qLqOWCEmZg/9vC54GgX5gmSba +RvCAP8FO12RIDQrlRXrDkVkSWslYz5UnnsRqyZuf+KKvd9WsE52ifZvGg92KO0ns +z/LhJIdT+0KtM4WdOCxo3TjOa/g3RYc2kQ7zMl6llpykB4G6dkm03YBpPAQJbDrM +4jhrQGg61snq7YtwdWMY3lzSA71MI12Kmd1ncdhYrTiHuupvsLM= +=w0bp +-END PGP SIGNATURE- Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 == --- dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 (added) +++ dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.md5 Mon Apr 23 17:59:26 2018 @@ -0,0 +1,2 @@ +apache-airflow-1.10.0b1+incubating.tar.gz: +B1 1D A8 1F D3 A8 1D 47 21 EE F9 FD 72 15 50 07 Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512 == --- dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512 (added) +++ dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0b1+incubating.tar.gz.sha512 Mon Apr 23 17:59:26 2018 @@ -0,0 +1,3 @@ +apache-airflow-1.10.0b1+incubating.tar.gz: +08FE7003 8C3CEBD8 907FB279 2FEFE4C1 698FC44A 1CCF26CE 9204DD4F 35A1AABA 51819B4C + 50294324 70425C99 71881343 B39F34DC A405AB67 1A9F8527 45992C2F Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz == Binary file - no diff available. Propchange: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz -- svn:mime-type = application/octet-stream Added: dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc == --- dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc (added) +++ dev/incubator/airflow/1.10.0beta1/apache-airflow-1.10.0beta1+incubating-source.tar.gz.asc Mon Apr 23 17:59:26 2018 @@ -0,0 +1,16 @@ +-BEGIN PGP SIGNATURE- + +iQIzBAABCgAdFiEEB1hcT6CMC2tdrJKNNRkLg9kFoLoFAlreHc0ACgkQNRkLg9kF +oLpxKA//WW3OaL2pfmB62HxSQ2/FC1kauhQzm3Jlzpfx2SHAq32P2GI0hKS8fHxF +YwOoAsRgGl5MSzbyj2rUsZn1c/IS3P2PaN8tfq911VR2wLl1NzEmYNGWn3N1pbms +IqabAQSYxjIl+GXF6UpdAxiwCz5WGR9AxCnesINw3H6JFMOv4MsrBPL+f5Tbg5mG +E3daxm3XOX56BTN9ZtkzZPB07EP//ELpFzYmNI9HD8GK7YqdJxSOrri3PfSt6i0V +RADqJwaa7LxRi2g6X5GAaYsnEEwO9xvO+ZhD54wesodFBr5XfOZi23Rx7334MJKG +t2mGZ7aDn68FNb+JcBLqef4yFfrlfuytq+8WbQZfMIjVUi3vsGzvhJJMCpZjO+Ke
incubator-airflow git commit: Add incubating
Repository: incubator-airflow Updated Branches: refs/heads/v1-10-test 2a1676e29 -> 64900e2b0 Add incubating Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/64900e2b Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/64900e2b Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/64900e2b Branch: refs/heads/v1-10-test Commit: 64900e2b0753f03fcc9d5fcee1c671f8440109a9 Parents: 2a1676e Author: Bolke de BruinAuthored: Mon Apr 23 19:35:55 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:35:55 2018 +0200 -- airflow/version.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/64900e2b/airflow/version.py -- diff --git a/airflow/version.py b/airflow/version.py index 6ead469..3da4fc3 100644 --- a/airflow/version.py +++ b/airflow/version.py @@ -18,4 +18,4 @@ # under the License. # -version = '1.10.0beta1' +version = '1.10.0beta1+incubating'
incubator-airflow git commit: Add incubating
Repository: incubator-airflow Updated Branches: refs/heads/master 305a787e3 -> a30acafc8 Add incubating Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a30acafc Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a30acafc Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a30acafc Branch: refs/heads/master Commit: a30acafc866b24880225495dd51ed71344758d54 Parents: 305a787 Author: Bolke de BruinAuthored: Mon Apr 23 19:34:07 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:34:07 2018 +0200 -- airflow/version.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a30acafc/airflow/version.py -- diff --git a/airflow/version.py b/airflow/version.py index 42f1dea..750da36 100644 --- a/airflow/version.py +++ b/airflow/version.py @@ -18,4 +18,4 @@ # under the License. # -version = '2.0.0dev0' +version = '2.0.0dev0+incubating'
incubator-airflow git commit: Bump version
Repository: incubator-airflow Updated Branches: refs/heads/master 2b030699d -> 305a787e3 Bump version Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/305a787e Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/305a787e Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/305a787e Branch: refs/heads/master Commit: 305a787e33b086fad605379853e5428397d2ba26 Parents: 2b03069 Author: Bolke de BruinAuthored: Mon Apr 23 19:26:40 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:26:40 2018 +0200 -- airflow/version.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/305a787e/airflow/version.py -- diff --git a/airflow/version.py b/airflow/version.py index 08abc30..42f1dea 100644 --- a/airflow/version.py +++ b/airflow/version.py @@ -18,4 +18,4 @@ # under the License. # -version = '1.10.0dev0+incubating' +version = '2.0.0dev0'
incubator-airflow git commit: Bump version
Repository: incubator-airflow Updated Branches: refs/heads/v1-10-test 2b030699d -> 2a1676e29 Bump version Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/2a1676e2 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/2a1676e2 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/2a1676e2 Branch: refs/heads/v1-10-test Commit: 2a1676e2908514308221e146525a16565e5ea40d Parents: 2b03069 Author: Bolke de BruinAuthored: Mon Apr 23 19:25:49 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:25:49 2018 +0200 -- airflow/version.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/2a1676e2/airflow/version.py -- diff --git a/airflow/version.py b/airflow/version.py index 08abc30..6ead469 100644 --- a/airflow/version.py +++ b/airflow/version.py @@ -18,4 +18,4 @@ # under the License. # -version = '1.10.0dev0+incubating' +version = '1.10.0beta1'
[jira] [Commented] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM
[ https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448471#comment-16448471 ] ASF subversion and git services commented on AIRFLOW-1652: -- Commit 2b030699de351c557a646e8e620c1b96a0397070 in incubator-airflow's branch refs/heads/master from [~andrewachen] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2b03069 ] [AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM [AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM Push run_id and run_page_url into xcom so callbacks and other tasks can reference this information address comments Closes #2641 from andrewmchen/databricks-xcom > Push DatabricksRunSubmitOperator metadata into XCOM > --- > > Key: AIRFLOW-1652 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1652 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andrew Chen >Priority: Major > Fix For: 1.10.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM
[ https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-1652. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #2641 [https://github.com/apache/incubator-airflow/pull/2641] > Push DatabricksRunSubmitOperator metadata into XCOM > --- > > Key: AIRFLOW-1652 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1652 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andrew Chen >Priority: Major > Fix For: 1.10.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1652) Push DatabricksRunSubmitOperator metadata into XCOM
[ https://issues.apache.org/jira/browse/AIRFLOW-1652?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448472#comment-16448472 ] ASF subversion and git services commented on AIRFLOW-1652: -- Commit 2b030699de351c557a646e8e620c1b96a0397070 in incubator-airflow's branch refs/heads/master from [~andrewachen] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2b03069 ] [AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM [AIRFLOW-1652] Push DatabricksRunSubmitOperator metadata into XCOM Push run_id and run_page_url into xcom so callbacks and other tasks can reference this information address comments Closes #2641 from andrewmchen/databricks-xcom > Push DatabricksRunSubmitOperator metadata into XCOM > --- > > Key: AIRFLOW-1652 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1652 > Project: Apache Airflow > Issue Type: Bug >Reporter: Andrew Chen >Priority: Major > Fix For: 1.10.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-775) AutoCommit in jdbc hook seems not to turn off if set to false
[ https://issues.apache.org/jira/browse/AIRFLOW-775?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-775. Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3257 [https://github.com/apache/incubator-airflow/pull/3257] > AutoCommit in jdbc hook seems not to turn off if set to false > - > > Key: AIRFLOW-775 > URL: https://issues.apache.org/jira/browse/AIRFLOW-775 > Project: Apache Airflow > Issue Type: Bug > Components: db, hooks >Reporter: Daniel Lamblin >Priority: Major > Fix For: 1.10.0 > > > If I use JdbcHook and run with autocommit=false I still get exceptions when > the commit is made because autocommit mode is on by default and apparently > was not set to off. > This can be worked around by setting the connection host with > ;autocommit=false > however it doesn't seem like the intended behavior when passing > autocommit=False with the hook's methods. > The JdbcHook does not seem to have a constructor that could take the jdbc > driver, location, host, schema, port, username, and password and work without > a set connection id, so working around this in code isn't too straightforward > either. > [2017-01-19 19:03:22,728] {models.py:1286} ERROR - > org.netezza.error.NzSQLException: The connection object is in auto-commit mode > Traceback (most recent call last): > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/models.py", > line 1242, in run > result = task_copy.execute(context=context) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 66, in execute > return_value = self.python_callable(*self.op_args, **self.op_kwargs) > File > "/Users/daniellamblin/airflow/dags/dpds/dpds_go_pda_dwd_sku_and_dwd_hist_up_sku_grade.py", > line 356, in stage_to_update_tables > hook.run(sql=sql, autocommit=False) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py", > line 134, in run > conn.commit() > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 391, in commit > _handle_sql_exception() > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 148, in _handle_sql_exception_jpype > reraise(exc_type, exc_info[1], exc_info[2]) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 389, in commit > self.jconn.commit() > DatabaseError: org.netezza.error.NzSQLException: The connection object is in > auto-commit mode > [2017-01-19 19:03:22,730] {models.py:1306} INFO - Marking task as FAILED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-766) Skip conn.commit() when in Auto-commit
[ https://issues.apache.org/jira/browse/AIRFLOW-766?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-766. Resolution: Fixed Fix Version/s: (was: Airflow 2.0) 1.10.0 Issue resolved by pull request #3257 [https://github.com/apache/incubator-airflow/pull/3257] > Skip conn.commit() when in Auto-commit > -- > > Key: AIRFLOW-766 > URL: https://issues.apache.org/jira/browse/AIRFLOW-766 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: Airflow 2.0 > Environment: Airflow 2.0, IBM Netezza >Reporter: Pfubar.k >Assignee: Pfubar.k >Priority: Major > Labels: easyfix > Fix For: 1.10.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Some JDBC Drivers fails when using DbApiHook.run(), DbApiHook.insert_rows(). > I'm using IBM Netezza. When auto-commit mode is on I get this error message. > {code} > NzSQLException: The connection object is in auto-commit mode > {code} > conn.commit() needs to be called only when auto-commit mode is off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.
[ https://issues.apache.org/jira/browse/AIRFLOW-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448459#comment-16448459 ] ASF subversion and git services commented on AIRFLOW-2364: -- Commit a33b29c8512c599b97e53808ebf796039a89c15c in incubator-airflow's branch refs/heads/master from [~artwr] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a33b29c ] [AIRFLOW-2364] Warn when setting autocommit on a connection which does not support it > The autocommit flag can be set on a connection which does not support it. > - > > Key: AIRFLOW-2364 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2364 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > Fix For: 1.10.0 > > > We could just add a logging warning when the method is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-775) AutoCommit in jdbc hook seems not to turn off if set to false
[ https://issues.apache.org/jira/browse/AIRFLOW-775?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448460#comment-16448460 ] ASF subversion and git services commented on AIRFLOW-775: - Commit 97954e21229982f580f18732c5018714f7e51087 in incubator-airflow's branch refs/heads/master from [~artwr] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=97954e2 ] [AIRFLOW-775] Fix autocommit settings with Jdbc hook > AutoCommit in jdbc hook seems not to turn off if set to false > - > > Key: AIRFLOW-775 > URL: https://issues.apache.org/jira/browse/AIRFLOW-775 > Project: Apache Airflow > Issue Type: Bug > Components: db, hooks >Reporter: Daniel Lamblin >Priority: Major > > If I use JdbcHook and run with autocommit=false I still get exceptions when > the commit is made because autocommit mode is on by default and apparently > was not set to off. > This can be worked around by setting the connection host with > ;autocommit=false > however it doesn't seem like the intended behavior when passing > autocommit=False with the hook's methods. > The JdbcHook does not seem to have a constructor that could take the jdbc > driver, location, host, schema, port, username, and password and work without > a set connection id, so working around this in code isn't too straightforward > either. > [2017-01-19 19:03:22,728] {models.py:1286} ERROR - > org.netezza.error.NzSQLException: The connection object is in auto-commit mode > Traceback (most recent call last): > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/models.py", > line 1242, in run > result = task_copy.execute(context=context) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/operators/python_operator.py", > line 66, in execute > return_value = self.python_callable(*self.op_args, **self.op_kwargs) > File > "/Users/daniellamblin/airflow/dags/dpds/dpds_go_pda_dwd_sku_and_dwd_hist_up_sku_grade.py", > line 356, in stage_to_update_tables > hook.run(sql=sql, autocommit=False) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/airflow/hooks/dbapi_hook.py", > line 134, in run > conn.commit() > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 391, in commit > _handle_sql_exception() > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 148, in _handle_sql_exception_jpype > reraise(exc_type, exc_info[1], exc_info[2]) > File > "/Users/daniellamblin/VEnvs/airflow-dags/lib/python2.7/site-packages/jaydebeapi/__init__.py", > line 389, in commit > self.jconn.commit() > DatabaseError: org.netezza.error.NzSQLException: The connection object is in > auto-commit mode > [2017-01-19 19:03:22,730] {models.py:1306} INFO - Marking task as FAILED. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-766) Skip conn.commit() when in Auto-commit
[ https://issues.apache.org/jira/browse/AIRFLOW-766?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448458#comment-16448458 ] ASF subversion and git services commented on AIRFLOW-766: - Commit 3450f526cec894cc7beceeae18f9d54cb2ae7520 in incubator-airflow's branch refs/heads/master from [~artwr] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=3450f52 ] [AIRFLOW-766] Skip conn.commit() when in Auto-commit > Skip conn.commit() when in Auto-commit > -- > > Key: AIRFLOW-766 > URL: https://issues.apache.org/jira/browse/AIRFLOW-766 > Project: Apache Airflow > Issue Type: Bug > Components: db >Affects Versions: Airflow 2.0 > Environment: Airflow 2.0, IBM Netezza >Reporter: Pfubar.k >Assignee: Pfubar.k >Priority: Major > Labels: easyfix > Fix For: Airflow 2.0 > > Original Estimate: 1h > Remaining Estimate: 1h > > Some JDBC Drivers fails when using DbApiHook.run(), DbApiHook.insert_rows(). > I'm using IBM Netezza. When auto-commit mode is on I get this error message. > {code} > NzSQLException: The connection object is in auto-commit mode > {code} > conn.commit() needs to be called only when auto-commit mode is off. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.
[ https://issues.apache.org/jira/browse/AIRFLOW-2364?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2364. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3257 [https://github.com/apache/incubator-airflow/pull/3257] > The autocommit flag can be set on a connection which does not support it. > - > > Key: AIRFLOW-2364 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2364 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > Fix For: 1.10.0 > > > We could just add a logging warning when the method is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[2/4] incubator-airflow git commit: [AIRFLOW-2364] Warn when setting autocommit on a connection which does not support it
[AIRFLOW-2364] Warn when setting autocommit on a connection which does not support it Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a33b29c8 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a33b29c8 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a33b29c8 Branch: refs/heads/master Commit: a33b29c8512c599b97e53808ebf796039a89c15c Parents: 3450f52 Author: Arthur WiedmerAuthored: Mon Apr 23 09:42:13 2018 -0700 Committer: Arthur Wiedmer Committed: Mon Apr 23 09:52:08 2018 -0700 -- airflow/hooks/dbapi_hook.py | 8 1 file changed, 8 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a33b29c8/airflow/hooks/dbapi_hook.py -- diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py index b6cdff6..25b1588 100644 --- a/airflow/hooks/dbapi_hook.py +++ b/airflow/hooks/dbapi_hook.py @@ -173,6 +173,14 @@ class DbApiHook(BaseHook): conn.commit() def set_autocommit(self, conn, autocommit): +""" +Sets the autocommit flag on the connection +""" +if not self.supports_autocommit and autocommit: +self.log.warn( +("%s connection doesn't support " + "autocommit but autocommit activated."), +getattr(self, self.conn_name_attr)) conn.autocommit = autocommit def get_cursor(self):
[4/4] incubator-airflow git commit: Merge pull request #3257 from artwr/awiedmer-fix-issue-with-jdbc-autocommit
Merge pull request #3257 from artwr/awiedmer-fix-issue-with-jdbc-autocommit Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1e82e11a Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1e82e11a Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1e82e11a Branch: refs/heads/master Commit: 1e82e11aed7b28c1c0622e8f830624346ee7d55d Parents: 65b6cea 97954e2 Author: Bolke de BruinAuthored: Mon Apr 23 19:08:24 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:08:24 2018 +0200 -- airflow/hooks/dbapi_hook.py | 15 --- airflow/hooks/jdbc_hook.py | 2 +- 2 files changed, 13 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1e82e11a/airflow/hooks/dbapi_hook.py --
[1/4] incubator-airflow git commit: [AIRFLOW-766] Skip conn.commit() when in Auto-commit
Repository: incubator-airflow Updated Branches: refs/heads/master 65b6ceae7 -> 1e82e11ae [AIRFLOW-766] Skip conn.commit() when in Auto-commit Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/3450f526 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/3450f526 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/3450f526 Branch: refs/heads/master Commit: 3450f526cec894cc7beceeae18f9d54cb2ae7520 Parents: 1d3bb54 Author: Arthur WiedmerAuthored: Mon Apr 23 09:41:56 2018 -0700 Committer: Arthur Wiedmer Committed: Mon Apr 23 09:42:10 2018 -0700 -- airflow/hooks/dbapi_hook.py | 7 --- 1 file changed, 4 insertions(+), 3 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/3450f526/airflow/hooks/dbapi_hook.py -- diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py index a9f4e43..b6cdff6 100644 --- a/airflow/hooks/dbapi_hook.py +++ b/airflow/hooks/dbapi_hook.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -169,7 +169,8 @@ class DbApiHook(BaseHook): else: cur.execute(s) -conn.commit() +if not conn.autocommit: +conn.commit() def set_autocommit(self, conn, autocommit): conn.autocommit = autocommit
[3/4] incubator-airflow git commit: [AIRFLOW-775] Fix autocommit settings with Jdbc hook
[AIRFLOW-775] Fix autocommit settings with Jdbc hook Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/97954e21 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/97954e21 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/97954e21 Branch: refs/heads/master Commit: 97954e21229982f580f18732c5018714f7e51087 Parents: a33b29c Author: Arthur WiedmerAuthored: Mon Apr 23 09:42:21 2018 -0700 Committer: Arthur Wiedmer Committed: Mon Apr 23 09:52:08 2018 -0700 -- airflow/hooks/jdbc_hook.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/97954e21/airflow/hooks/jdbc_hook.py -- diff --git a/airflow/hooks/jdbc_hook.py b/airflow/hooks/jdbc_hook.py index b76e80d..8f0cd67 100644 --- a/airflow/hooks/jdbc_hook.py +++ b/airflow/hooks/jdbc_hook.py @@ -58,4 +58,4 @@ class JdbcHook(DbApiHook): :param conn: The connection :return: """ -conn.jconn.autocommit = autocommit +conn.jconn.setAutoCommit(autocommit)
[jira] [Commented] (AIRFLOW-2234) Enable insert_rows for PrestoHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448446#comment-16448446 ] ASF subversion and git services commented on AIRFLOW-2234: -- Commit 65b6ceae74c166efe95113ad5aa55004e2ad25c5 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=65b6cea ] [AIRFLOW-2234] Enable insert_rows for PrestoHook PrestoHook.insert_rows() raises NotImplementedError for now. But Presto 0.126+ allows specifying column names in INSERT queries, so we can leverage DbApiHook.insert_rows() almost as is. This PR enables this function. Closes #3146 from sekikn/AIRFLOW-2234 > Enable insert_rows for PrestoHook > - > > Key: AIRFLOW-2234 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2234 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.0 > > > PrestoHook.insert_rows() raises NotImplementedError for now. > But [Presto 0.126+ allows specifying column names in INSERT > queries|https://prestodb.io/docs/current/release/release-0.126.html], so we > can leverage DbApiHook.insert_rows() almost as is. > I think there is no reason to keep it disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2234] Enable insert_rows for PrestoHook
Repository: incubator-airflow Updated Branches: refs/heads/master ed9329017 -> 65b6ceae7 [AIRFLOW-2234] Enable insert_rows for PrestoHook PrestoHook.insert_rows() raises NotImplementedError for now. But Presto 0.126+ allows specifying column names in INSERT queries, so we can leverage DbApiHook.insert_rows() almost as is. This PR enables this function. Closes #3146 from sekikn/AIRFLOW-2234 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/65b6ceae Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/65b6ceae Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/65b6ceae Branch: refs/heads/master Commit: 65b6ceae74c166efe95113ad5aa55004e2ad25c5 Parents: ed93290 Author: Kengo SekiAuthored: Mon Apr 23 19:01:38 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 19:01:38 2018 +0200 -- airflow/hooks/dbapi_hook.py | 4 +-- airflow/hooks/presto_hook.py| 17 -- tests/hooks/test_dbapi_hook.py | 64 +--- tests/hooks/test_presto_hook.py | 49 +++ 4 files changed, 125 insertions(+), 9 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/airflow/hooks/dbapi_hook.py -- diff --git a/airflow/hooks/dbapi_hook.py b/airflow/hooks/dbapi_hook.py index a9f4e43..de0a3a3 100644 --- a/airflow/hooks/dbapi_hook.py +++ b/airflow/hooks/dbapi_hook.py @@ -213,8 +213,8 @@ class DbApiHook(BaseHook): for cell in row: l.append(self._serialize_cell(cell, conn)) values = tuple(l) -placeholders = ["%s",]*len(values) -sql = "INSERT INTO {0} {1} VALUES ({2});".format( +placeholders = ["%s", ] * len(values) +sql = "INSERT INTO {0} {1} VALUES ({2})".format( table, target_fields, ",".join(placeholders)) http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/airflow/hooks/presto_hook.py -- diff --git a/airflow/hooks/presto_hook.py b/airflow/hooks/presto_hook.py index 935a9a5..8920448 100644 --- a/airflow/hooks/presto_hook.py +++ b/airflow/hooks/presto_hook.py @@ -114,5 +114,18 @@ class PrestoHook(DbApiHook): """ return super(PrestoHook, self).run(self._strip_sql(hql), parameters) -def insert_rows(self): -raise NotImplementedError() +# TODO Enable commit_every once PyHive supports transaction. +# Unfortunately, PyHive 0.5.1 doesn't support transaction for now, +# whereas Presto 0.132+ does. +def insert_rows(self, table, rows, target_fields=None): +""" +A generic way to insert a set of tuples into a table. + +:param table: Name of the target table +:type table: str +:param rows: The rows to insert into the table +:type rows: iterable of tuples +:param target_fields: The names of the columns to fill in the table +:type target_fields: iterable of strings +""" +super(PrestoHook, self).insert_rows(table, rows, target_fields, 0) http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/65b6ceae/tests/hooks/test_dbapi_hook.py -- diff --git a/tests/hooks/test_dbapi_hook.py b/tests/hooks/test_dbapi_hook.py index 9fcc970..c3ae187 100644 --- a/tests/hooks/test_dbapi_hook.py +++ b/tests/hooks/test_dbapi_hook.py @@ -28,17 +28,18 @@ class TestDbApiHook(unittest.TestCase): def setUp(self): super(TestDbApiHook, self).setUp() - + self.cur = mock.MagicMock() -self.conn = conn = mock.MagicMock() +self.conn = mock.MagicMock() self.conn.cursor.return_value = self.cur - +conn = self.conn + class TestDBApiHook(DbApiHook): conn_name_attr = 'test_conn_id' - + def get_conn(self): return conn - + self.db_hook = TestDBApiHook() def test_get_records(self): @@ -78,3 +79,56 @@ class TestDbApiHook(unittest.TestCase): self.conn.close.assert_called_once() self.cur.close.assert_called_once() self.cur.execute.assert_called_once_with(statement) + +def test_insert_rows(self): +table = "table" +rows = [("hello",), +("world",)] + +self.db_hook.insert_rows(table, rows) + +self.conn.close.assert_called_once() +self.cur.close.assert_called_once() + +
[jira] [Commented] (AIRFLOW-2234) Enable insert_rows for PrestoHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448447#comment-16448447 ] ASF subversion and git services commented on AIRFLOW-2234: -- Commit 65b6ceae74c166efe95113ad5aa55004e2ad25c5 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=65b6cea ] [AIRFLOW-2234] Enable insert_rows for PrestoHook PrestoHook.insert_rows() raises NotImplementedError for now. But Presto 0.126+ allows specifying column names in INSERT queries, so we can leverage DbApiHook.insert_rows() almost as is. This PR enables this function. Closes #3146 from sekikn/AIRFLOW-2234 > Enable insert_rows for PrestoHook > - > > Key: AIRFLOW-2234 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2234 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.0 > > > PrestoHook.insert_rows() raises NotImplementedError for now. > But [Presto 0.126+ allows specifying column names in INSERT > queries|https://prestodb.io/docs/current/release/release-0.126.html], so we > can leverage DbApiHook.insert_rows() almost as is. > I think there is no reason to keep it disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2234) Enable insert_rows for PrestoHook
[ https://issues.apache.org/jira/browse/AIRFLOW-2234?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2234. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3146 [https://github.com/apache/incubator-airflow/pull/3146] > Enable insert_rows for PrestoHook > - > > Key: AIRFLOW-2234 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2234 > Project: Apache Airflow > Issue Type: Improvement > Components: hooks >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 1.10.0 > > > PrestoHook.insert_rows() raises NotImplementedError for now. > But [Presto 0.126+ allows specifying column names in INSERT > queries|https://prestodb.io/docs/current/release/release-0.126.html], so we > can leverage DbApiHook.insert_rows() almost as is. > I think there is no reason to keep it disabled. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2208) Keep execution_date context when switching from task_instance_details
[ https://issues.apache.org/jira/browse/AIRFLOW-2208?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448444#comment-16448444 ] ASF subversion and git services commented on AIRFLOW-2208: -- Commit ed932901757a9e38049ef77e8b02c8bb8a9452c7 in incubator-airflow's branch refs/heads/master from [~iansuvak] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=ed93290 ] [AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view Allow graph view to accept blank execution_date and pass it through when it's available. Closes #3132 from iansuvak/persistent_graph > Keep execution_date context when switching from task_instance_details > - > > Key: AIRFLOW-2208 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2208 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Reporter: Ian Suvak >Priority: Minor > Fix For: 1.10.0 > > > With alerts that give direct links to Task Instance logs it's difficult to go > back to a Graph view of that Dag Run, since clicking Graph link goes to the > most recent Dag Run. If within a view that's specific to a DagRun such as > viewing task_instance log click graph link should go to DagRun that the Task > Instance belongs to. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view
Repository: incubator-airflow Updated Branches: refs/heads/master 09bbe2477 -> ed9329017 [AIRFLOW-2208][Airflow-22208] Link to same DagRun graph from TaskInstance view Allow graph view to accept blank execution_date and pass it through when it's available. Closes #3132 from iansuvak/persistent_graph Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/ed932901 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/ed932901 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/ed932901 Branch: refs/heads/master Commit: ed932901757a9e38049ef77e8b02c8bb8a9452c7 Parents: 09bbe24 Author: Ian SuvakAuthored: Mon Apr 23 18:59:58 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 18:59:58 2018 +0200 -- airflow/www/templates/airflow/dag.html | 2 +- airflow/www/utils.py | 2 +- tests/core.py | 40 + 3 files changed, 25 insertions(+), 19 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/airflow/www/templates/airflow/dag.html -- diff --git a/airflow/www/templates/airflow/dag.html b/airflow/www/templates/airflow/dag.html index d5a145d..ed84f27 100644 --- a/airflow/www/templates/airflow/dag.html +++ b/airflow/www/templates/airflow/dag.html @@ -54,7 +54,7 @@ Back to {{ dag.parent_dag.dag_id }} {% endif %} - + Graph View http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/airflow/www/utils.py -- diff --git a/airflow/www/utils.py b/airflow/www/utils.py index ada8ce2..4a3ac2e 100644 --- a/airflow/www/utils.py +++ b/airflow/www/utils.py @@ -259,7 +259,7 @@ def action_logging(f): task_id=request.args.get('task_id'), dag_id=request.args.get('dag_id')) -if 'execution_date' in request.args: +if request.args.get('execution_date'): log.execution_date = timezone.parse(request.args.get('execution_date')) with create_session() as session: http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/ed932901/tests/core.py -- diff --git a/tests/core.py b/tests/core.py index 308804d..9cbae9d 100644 --- a/tests/core.py +++ b/tests/core.py @@ -1596,12 +1596,12 @@ class WebUiTests(unittest.TestCase): self.dagbag = models.DagBag(include_examples=True) self.dag_bash = self.dagbag.dags['example_bash_operator'] -self.dag_bash2 = self.dagbag.dags['test_example_bash_operator'] +self.dag_python = self.dagbag.dags['example_python_operator'] self.sub_dag = self.dagbag.dags['example_subdag_operator'] self.runme_0 = self.dag_bash.get_task('runme_0') self.example_xcom = self.dagbag.dags['example_xcom'] -self.dagrun_bash2 = self.dag_bash2.create_dagrun( +self.dagrun_python = self.dag_python.create_dagrun( run_id="test_{}".format(models.DagRun.id_for_date(timezone.utcnow())), execution_date=DEFAULT_DATE, start_date=timezone.utcnow(), @@ -1628,14 +1628,16 @@ class WebUiTests(unittest.TestCase): self.assertIn("DAGs", resp_html) self.assertIn("example_bash_operator", resp_html) -# The HTML should contain data for the last-run. A link to the specific run, and the text of -# the date. +# The HTML should contain data for the last-run. A link to the specific run, +# and the text of the date. url = "/admin/airflow/graph?" + urlencode({ -"dag_id": self.dag_bash2.dag_id, -"execution_date": self.dagrun_bash2.execution_date, +"dag_id": self.dag_python.dag_id, +"execution_date": self.dagrun_python.execution_date, }).replace("&", "") self.assertIn(url, resp_html) -self.assertIn(self.dagrun_bash2.execution_date.strftime("%Y-%m-%d %H:%M"), resp_html) +self.assertIn( +self.dagrun_python.execution_date.strftime("%Y-%m-%d %H:%M"), +resp_html) def test_query(self): response = self.app.get('/admin/queryview/') @@ -1662,6 +1664,10 @@ class WebUiTests(unittest.TestCase): response = self.app.get( '/admin/airflow/graph?dag_id=example_bash_operator') self.assertIn("runme_0", response.data.decode('utf-8')) +# confirm that the graph page loads when execution_date is blank +response = self.app.get( +
[jira] [Resolved] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context
[ https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-1153. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3136 [https://github.com/apache/incubator-airflow/pull/3136] > params in HiveOperator constructor can't be passed into Hive execution context > -- > > Key: AIRFLOW-1153 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1153 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks, operators >Affects Versions: Airflow 2.0, Airflow 1.8 >Reporter: Xianping lin >Assignee: Fokko Driesprong >Priority: Critical > Labels: easyfix, newbie > Fix For: 2.0.0 > > > params parameter in HiveOperator can't be imported into Hive executation > context. > so the following centence won't work, because 'mynumber' doesn't work for > sql sentence. > test_hiveoperator = HiveOperator( > task_id='hive_test', > hiveconf_jinja_translate=True, > hql = ''' use myDB; > INSERT OVERWRITE TABLE t2 > select * from t1 where t1.x > ' ${hiveconf:mynumber}' > ''', > params={'mynumber': 2}, > dag=dag > ) > this modification pass the 'params' in HiveOperator construction to Hive > sql execution context. > The the variable definition can pass to hive sql -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context
[ https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reassigned AIRFLOW-1153: - Assignee: Fokko Driesprong > params in HiveOperator constructor can't be passed into Hive execution context > -- > > Key: AIRFLOW-1153 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1153 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks, operators >Affects Versions: Airflow 2.0, Airflow 1.8 >Reporter: Xianping lin >Assignee: Fokko Driesprong >Priority: Critical > Labels: easyfix, newbie > Fix For: 2.0.0 > > > params parameter in HiveOperator can't be imported into Hive executation > context. > so the following centence won't work, because 'mynumber' doesn't work for > sql sentence. > test_hiveoperator = HiveOperator( > task_id='hive_test', > hiveconf_jinja_translate=True, > hql = ''' use myDB; > INSERT OVERWRITE TABLE t2 > select * from t1 where t1.x > ' ${hiveconf:mynumber}' > ''', > params={'mynumber': 2}, > dag=dag > ) > this modification pass the 'params' in HiveOperator construction to Hive > sql execution context. > The the variable definition can pass to hive sql -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context
[ https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448437#comment-16448437 ] ASF subversion and git services commented on AIRFLOW-1153: -- Commit 09bbe247728993867c716635951219cc49f65dd1 in incubator-airflow's branch refs/heads/master from Alan Ma [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=09bbe24 ] [AIRFLOW-1153] Allow HiveOperators to take hiveconfs HiveOperator can only replace variables via jinja and the replacements are global to the dag through the context and user_defined_macros. It would be much more flexible to open up hive_conf to the HiveOperator level so hive scripts can be recycled at the task level, leveraging HiveHook already existing hive_conf param and _prepare_hiveconf function. Closes #3136 from wolfier/AIRFLOW-1153 > params in HiveOperator constructor can't be passed into Hive execution context > -- > > Key: AIRFLOW-1153 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1153 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks, operators >Affects Versions: Airflow 2.0, Airflow 1.8 >Reporter: Xianping lin >Priority: Critical > Labels: easyfix, newbie > > params parameter in HiveOperator can't be imported into Hive executation > context. > so the following centence won't work, because 'mynumber' doesn't work for > sql sentence. > test_hiveoperator = HiveOperator( > task_id='hive_test', > hiveconf_jinja_translate=True, > hql = ''' use myDB; > INSERT OVERWRITE TABLE t2 > select * from t1 where t1.x > ' ${hiveconf:mynumber}' > ''', > params={'mynumber': 2}, > dag=dag > ) > this modification pass the 'params' in HiveOperator construction to Hive > sql execution context. > The the variable definition can pass to hive sql -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-1153) params in HiveOperator constructor can't be passed into Hive execution context
[ https://issues.apache.org/jira/browse/AIRFLOW-1153?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448436#comment-16448436 ] ASF subversion and git services commented on AIRFLOW-1153: -- Commit 09bbe247728993867c716635951219cc49f65dd1 in incubator-airflow's branch refs/heads/master from Alan Ma [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=09bbe24 ] [AIRFLOW-1153] Allow HiveOperators to take hiveconfs HiveOperator can only replace variables via jinja and the replacements are global to the dag through the context and user_defined_macros. It would be much more flexible to open up hive_conf to the HiveOperator level so hive scripts can be recycled at the task level, leveraging HiveHook already existing hive_conf param and _prepare_hiveconf function. Closes #3136 from wolfier/AIRFLOW-1153 > params in HiveOperator constructor can't be passed into Hive execution context > -- > > Key: AIRFLOW-1153 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1153 > Project: Apache Airflow > Issue Type: Bug > Components: hive_hooks, operators >Affects Versions: Airflow 2.0, Airflow 1.8 >Reporter: Xianping lin >Priority: Critical > Labels: easyfix, newbie > > params parameter in HiveOperator can't be imported into Hive executation > context. > so the following centence won't work, because 'mynumber' doesn't work for > sql sentence. > test_hiveoperator = HiveOperator( > task_id='hive_test', > hiveconf_jinja_translate=True, > hql = ''' use myDB; > INSERT OVERWRITE TABLE t2 > select * from t1 where t1.x > ' ${hiveconf:mynumber}' > ''', > params={'mynumber': 2}, > dag=dag > ) > this modification pass the 'params' in HiveOperator construction to Hive > sql execution context. > The the variable definition can pass to hive sql -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-1153] Allow HiveOperators to take hiveconfs
Repository: incubator-airflow Updated Branches: refs/heads/master e30a1f451 -> 09bbe2477 [AIRFLOW-1153] Allow HiveOperators to take hiveconfs HiveOperator can only replace variables via jinja and the replacements are global to the dag through the context and user_defined_macros. It would be much more flexible to open up hive_conf to the HiveOperator level so hive scripts can be recycled at the task level, leveraging HiveHook already existing hive_conf param and _prepare_hiveconf function. Closes #3136 from wolfier/AIRFLOW-1153 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/09bbe247 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/09bbe247 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/09bbe247 Branch: refs/heads/master Commit: 09bbe247728993867c716635951219cc49f65dd1 Parents: e30a1f4 Author: Alan MaAuthored: Mon Apr 23 18:56:29 2018 +0200 Committer: Fokko Driesprong Committed: Mon Apr 23 18:56:29 2018 +0200 -- airflow/operators/hive_operator.py | 21 +++-- tests/operators/hive_operator.py | 10 ++ 2 files changed, 25 insertions(+), 6 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09bbe247/airflow/operators/hive_operator.py -- diff --git a/airflow/operators/hive_operator.py b/airflow/operators/hive_operator.py index 64ea61d..b62744b 100644 --- a/airflow/operators/hive_operator.py +++ b/airflow/operators/hive_operator.py @@ -35,6 +35,9 @@ class HiveOperator(BaseOperator): :type hql: string :param hive_cli_conn_id: reference to the Hive database :type hive_cli_conn_id: string +:param hiveconfs: if defined, these key value pairs will be passed +to hive as ``-hiveconf "key"="value"`` +:type hiveconfs: dict :param hiveconf_jinja_translate: when True, hiveconf-type templating ${var} gets translated into jinja-type templating {{ var }} and ${hiveconf:var} gets translated into jinja-type templating {{ var }}. @@ -56,7 +59,7 @@ class HiveOperator(BaseOperator): """ template_fields = ('hql', 'schema', 'hive_cli_conn_id', 'mapred_queue', - 'mapred_job_name', 'mapred_queue_priority') + 'hiveconfs', 'mapred_job_name', 'mapred_queue_priority') template_ext = ('.hql', '.sql',) ui_color = '#f0e4ec' @@ -65,6 +68,7 @@ class HiveOperator(BaseOperator): self, hql, hive_cli_conn_id='hive_cli_default', schema='default', +hiveconfs=None, hiveconf_jinja_translate=False, script_begin_tag=None, run_as_owner=False, @@ -74,15 +78,15 @@ class HiveOperator(BaseOperator): *args, **kwargs): super(HiveOperator, self).__init__(*args, **kwargs) -self.hiveconf_jinja_translate = hiveconf_jinja_translate self.hql = hql -self.schema = schema self.hive_cli_conn_id = hive_cli_conn_id +self.schema = schema +self.hiveconfs = hiveconfs or {} +self.hiveconf_jinja_translate = hiveconf_jinja_translate self.script_begin_tag = script_begin_tag self.run_as = None if run_as_owner: self.run_as = self.dag.owner - self.mapred_queue = mapred_queue self.mapred_queue_priority = mapred_queue_priority self.mapred_job_name = mapred_job_name @@ -119,8 +123,13 @@ class HiveOperator(BaseOperator): .format(ti.hostname.split('.')[0], ti.dag_id, ti.task_id, ti.execution_date.isoformat()) -self.hook.run_cli(hql=self.hql, schema=self.schema, - hive_conf=context_to_airflow_vars(context)) +if self.hiveconf_jinja_translate: +self.hiveconfs = context_to_airflow_vars(context) +else: +self.hiveconfs.update(context_to_airflow_vars(context)) + +self.log.info('Passing HiveConf: %s', self.hiveconfs) +self.hook.run_cli(hql=self.hql, schema=self.schema, hive_conf=self.hiveconfs) def dry_run(self): self.hook = self.get_hook() http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/09bbe247/tests/operators/hive_operator.py -- diff --git a/tests/operators/hive_operator.py b/tests/operators/hive_operator.py index 9cc0f1a..0914ba9 100644 --- a/tests/operators/hive_operator.py +++ b/tests/operators/hive_operator.py @@ -98,6 +98,16 @@ class HiveOperatorTest(HiveEnvironmentTest): t.prepare_template() self.assertEqual(t.hql, "SELECT {{ num_col
[jira] [Commented] (AIRFLOW-2357) Use a persisten volume for the Kubernetes logs
[ https://issues.apache.org/jira/browse/AIRFLOW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448411#comment-16448411 ] ASF subversion and git services commented on AIRFLOW-2357: -- Commit e30a1f451aa5ec5aca4c886067ba8946a3d33395 in incubator-airflow's branch refs/heads/master from [~Fokko] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=e30a1f4 ] [AIRFLOW-2357] Add persistent volume for the logs The logs are kept inside of the worker pod. By attaching a persistent disk we keep the logs and make them available for the webserver. - Remove the requirements.txt since we dont want to maintain another dependency file - Fix some small casing stuff - Removed some unused code - Add missing shebang lines - Started on some docs - Fixed the logging Closes #3252 from Fokko/airflow-2357-pd-for-logs > Use a persisten volume for the Kubernetes logs > -- > > Key: AIRFLOW-2357 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2357 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > Fix For: 1.10.0 > > > Right now, when a pod exits, the log is lost forever because it is still in > the container. By mounting a persistent volume we can easily fix this and > this allows us to have logs on our local machines (minikube). In production > you might want to write your logs to GCS/S3/etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2357] Add persistent volume for the logs
Repository: incubator-airflow Updated Branches: refs/heads/master 1d3bb5470 -> e30a1f451 [AIRFLOW-2357] Add persistent volume for the logs The logs are kept inside of the worker pod. By attaching a persistent disk we keep the logs and make them available for the webserver. - Remove the requirements.txt since we dont want to maintain another dependency file - Fix some small casing stuff - Removed some unused code - Add missing shebang lines - Started on some docs - Fixed the logging Closes #3252 from Fokko/airflow-2357-pd-for-logs Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/e30a1f45 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/e30a1f45 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/e30a1f45 Branch: refs/heads/master Commit: e30a1f451aa5ec5aca4c886067ba8946a3d33395 Parents: 1d3bb54 Author: Fokko DriesprongAuthored: Mon Apr 23 18:43:24 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 18:43:24 2018 +0200 -- airflow/config_templates/default_airflow.cfg| 3 + .../contrib/executors/kubernetes_executor.py| 216 + airflow/contrib/kubernetes/pod.py | 8 +- airflow/contrib/kubernetes/pod_generator.py | 150 +--- airflow/contrib/kubernetes/pod_launcher.py | 22 +- .../contrib/kubernetes/worker_configuration.py | 116 + .../operators/kubernetes_pod_operator.py| 21 +- airflow/utils/cli.py| 6 +- docs/kubernetes.rst | 12 +- scripts/ci/kubernetes/docker/Dockerfile | 15 +- scripts/ci/kubernetes/docker/Dockerfile_zip | 20 -- scripts/ci/kubernetes/docker/bootstrap.sh | 3 +- scripts/ci/kubernetes/docker/build.sh | 8 +- scripts/ci/kubernetes/docker/requirements.txt | 103 scripts/ci/kubernetes/kube/airflow.yaml | 221 + .../ci/kubernetes/kube/airflow.yaml.template| 240 --- scripts/ci/kubernetes/kube/deploy.sh| 9 +- scripts/ci/kubernetes/kube/postgres.yaml| 28 +-- scripts/ci/kubernetes/kube/volumes.yaml | 64 + 19 files changed, 531 insertions(+), 734 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e30a1f45/airflow/config_templates/default_airflow.cfg -- diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 6a966cb..845342c 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -518,6 +518,9 @@ dags_volume_subpath = # For DAGs mounted via a volume claim (mutually exclusive with volume claim) dags_volume_claim = +# A shared volume claim for the logs +logs_volume_claim = + # Git credentials and repository for DAGs mounted via Git (mutually exclusive with volume claim) git_repo = git_branch = http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/e30a1f45/airflow/contrib/executors/kubernetes_executor.py -- diff --git a/airflow/contrib/executors/kubernetes_executor.py b/airflow/contrib/executors/kubernetes_executor.py index 1a50d85..cdce95f 100644 --- a/airflow/contrib/executors/kubernetes_executor.py +++ b/airflow/contrib/executors/kubernetes_executor.py @@ -47,16 +47,10 @@ class KubernetesExecutorConfig: def __repr__(self): return "{}(image={}, request_memory={} ,request_cpu={}, limit_memory={}, " \ -"limit_cpu={}, gcp_service_account_key={})"\ -.format( -KubernetesExecutorConfig.__name__, -self.image, -self.request_memory, -self.request_cpu, -self.limit_memory, -self.limit_cpu, -self.gcp_service_account_key -) + "limit_cpu={}, gcp_service_account_key={})" \ +.format(KubernetesExecutorConfig.__name__, self.image, self.request_memory, +self.request_cpu, self.limit_memory, self.limit_cpu, +self.gcp_service_account_key) @staticmethod def from_dict(obj): @@ -65,33 +59,33 @@ class KubernetesExecutorConfig: if not isinstance(obj, dict): raise TypeError( -"Cannot convert a non-dictionary object into a KubernetesExecutorConfig") +'Cannot convert a non-dictionary object into a KubernetesExecutorConfig') namespaced = obj.get(Executors.KubernetesExecutor, {}) return KubernetesExecutorConfig( -
[jira] [Resolved] (AIRFLOW-2357) Use a persisten volume for the Kubernetes logs
[ https://issues.apache.org/jira/browse/AIRFLOW-2357?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2357. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3252 [https://github.com/apache/incubator-airflow/pull/3252] > Use a persisten volume for the Kubernetes logs > -- > > Key: AIRFLOW-2357 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2357 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Fokko Driesprong >Priority: Major > Fix For: 1.10.0 > > > Right now, when a pod exits, the log is lost forever because it is still in > the container. By mounting a persistent volume we can easily fix this and > this allows us to have logs on our local machines (minikube). In production > you might want to write your logs to GCS/S3/etc. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.
Arthur Wiedmer created AIRFLOW-2364: --- Summary: The autocommit flag can be set on a connection which does not support it. Key: AIRFLOW-2364 URL: https://issues.apache.org/jira/browse/AIRFLOW-2364 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer We could just add a logging warning when the method is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2363) S3 remote logging appending tuple instead of str
Kyle Hamlin created AIRFLOW-2363: Summary: S3 remote logging appending tuple instead of str Key: AIRFLOW-2363 URL: https://issues.apache.org/jira/browse/AIRFLOW-2363 Project: Apache Airflow Issue Type: Bug Components: logging Reporter: Kyle Hamlin Fix For: 1.10.0 A recent merge into master that added support for Elasticsearch logging seems to have broken S3 logging by returning a tuple instead of a string. [https://github.com/apache/incubator-airflow/commit/ec38ba9594395de04ec932481212a86fbe9ae107#diff-0442332ecbe42ebbf426911c68d8cd4aR128] following errors thrown: *Session NoneType error* Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/s3_task_handler.py", line 171, in s3_write encrypt=configuration.conf.getboolean('core', 'ENCRYPT_S3_LOGS'), File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 274, in load_string encrypt=encrypt) File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 313, in load_bytes client = self.get_conn() File "/usr/local/lib/python3.6/site-packages/airflow/hooks/S3_hook.py", line 34, in get_conn return self.get_client_type('s3') File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 151, in get_client_type session, endpoint_url = self._get_credentials(region_name) File "/usr/local/lib/python3.6/site-packages/airflow/contrib/hooks/aws_hook.py", line 97, in _get_credentials connection_object = self.get_connection(self.aws_conn_id) File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 82, in get_connection conn = random.choice(cls.get_connections(conn_id)) File "/usr/local/lib/python3.6/site-packages/airflow/hooks/base_hook.py", line 77, in get_connections conns = cls._get_connections_from_db(conn_id) File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 72, in wrapper with create_session() as session: File "/usr/local/lib/python3.6/contextlib.py", line 81, in __enter__ return next(self.gen) File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 41, in create_session session = settings.Session() TypeError: 'NoneType' object is not callable *TypeError must be str not tuple* [2018-04-16 18:37:28,200] ERROR in app: Exception on /admin/airflow/get_logs_with_metadata [GET] Traceback (most recent call last): File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1982, in wsgi_app response = self.full_dispatch_request() File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1614, in full_dispatch_request rv = self.handle_user_exception(e) File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1517, in handle_user_exception reraise(exc_type, exc_value, tb) File "/usr/local/lib/python3.6/site-packages/flask/_compat.py", line 33, in reraise raise value File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1612, in full_dispatch_request rv = self.dispatch_request() File "/usr/local/lib/python3.6/site-packages/flask/app.py", line 1598, in dispatch_request return self.view_functions[rule.endpoint](**req.view_args) File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 69, in inner return self._run_view(f, *args, **kwargs) File "/usr/local/lib/python3.6/site-packages/flask_admin/base.py", line 368, in _run_view return fn(self, *args, **kwargs) File "/usr/local/lib/python3.6/site-packages/flask_login.py", line 755, in decorated_view return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/www/utils.py", line 269, in wrapper return f(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/utils/db.py", line 74, in wrapper return func(*args, **kwargs) File "/usr/local/lib/python3.6/site-packages/airflow/www/views.py", line 770, in get_logs_with_metadata logs, metadatas = handler.read(ti, try_number, metadata=metadata) File "/usr/local/lib/python3.6/site-packages/airflow/utils/log/file_task_handler.py", line 165, in read logs[i] += log TypeError: must be str, not tuple -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2362) DockerOperator imports Client should import APIClient
Kyle Hamlin created AIRFLOW-2362: Summary: DockerOperator imports Client should import APIClient Key: AIRFLOW-2362 URL: https://issues.apache.org/jira/browse/AIRFLOW-2362 Project: Apache Airflow Issue Type: Bug Components: operators Reporter: Kyle Hamlin Fix For: 1.10.0 The docker-py package changed its API over a year ago to rename the Client to APIClient: https://github.com/docker/docker-py/commit/f5ac10c469fca252e69ae780749f4ec6fe369789 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2358) make kubernetes examples installed optionally
[ https://issues.apache.org/jira/browse/AIRFLOW-2358?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448241#comment-16448241 ] Taylor Edmiston commented on AIRFLOW-2358: -- Is a good solution here: the k8s DAGs would just not load and perhaps info log that this is due to the kubernetes install option not being set? > make kubernetes examples installed optionally > - > > Key: AIRFLOW-2358 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2358 > Project: Apache Airflow > Issue Type: Improvement > Components: examples >Affects Versions: Airflow 2.0, 1.10.0 >Reporter: Ruslan Dautkhanov >Priority: Major > Labels: kubernetes > > Is it possible to make kubernetes examples installed optionally? > > We don't use Kubernetes and a bare Airflow install fills logs with following : > > {quote}2018-04-22 19:49:04,718 ERROR - Failed to import: > /opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py > Traceback (most recent call last): > File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/models.py", > line 300, in process_file > m = imp.load_source(mod_name, filepath) > File > "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py", > line 19, in > from airflow.contrib.operators.kubernetes_pod_operator import > KubernetesPodOperator > File > "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/operators/kubernetes_pod_operator.py", > line 21, in > from airflow.contrib.kubernetes import kube_client, pod_generator, > pod_launcher > File > "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", > line 25, in > from kubernetes import watch > {color:#f6c342}ImportError: No module named kubernetes{color}{quote} > > Would be great to make examples driven by what modules installed if they have > external dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2351) timezone fix for @once DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2351. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3256 [https://github.com/apache/incubator-airflow/pull/3256] > timezone fix for @once DAGs > --- > > Key: AIRFLOW-2351 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2351 > Project: Apache Airflow > Issue Type: Bug > Components: core, scheduler >Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0 >Reporter: Ruslan Dautkhanov >Assignee: Ruslan Dautkhanov >Priority: Major > Fix For: 2.0.0 > > > As discussed on the dev list: > > {quote} > Upgraded Airflow .. getting following error [1] when processing a DAG > We have 'start_date': None set in default_args.. but this used to work im > previous airflow versions. > This is a '@once DAG.. so we don't need a start_date (no back fill). > {quote} > > {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: > /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py > Traceback (most recent call last): > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 290, in process_file > m = imp.load_source(mod_name, filepath) > File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line > 66, in > orientation = 'TB', # default > graph view > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 2951, in __init__ > self.timezone = self.default_args['start_date'].tzinfo > AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2351) timezone fix for @once DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448235#comment-16448235 ] ASF subversion and git services commented on AIRFLOW-2351: -- Commit 1d3bb5470711a935b36b6a0ab4c7ec414d460d75 in incubator-airflow's branch refs/heads/master from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1d3bb54 ] [AIRFLOW-2351] Check for valid default_args start_date A bug existed when default_args did contain start_date but it was set to None, failing to instantiate the DAG. Closes #3256 from bolkedebruin/AIRFLOW-2351 > timezone fix for @once DAGs > --- > > Key: AIRFLOW-2351 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2351 > Project: Apache Airflow > Issue Type: Bug > Components: core, scheduler >Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0 >Reporter: Ruslan Dautkhanov >Assignee: Ruslan Dautkhanov >Priority: Major > Fix For: 2.0.0 > > > As discussed on the dev list: > > {quote} > Upgraded Airflow .. getting following error [1] when processing a DAG > We have 'start_date': None set in default_args.. but this used to work im > previous airflow versions. > This is a '@once DAG.. so we don't need a start_date (no back fill). > {quote} > > {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: > /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py > Traceback (most recent call last): > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 290, in process_file > m = imp.load_source(mod_name, filepath) > File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line > 66, in > orientation = 'TB', # default > graph view > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 2951, in __init__ > self.timezone = self.default_args['start_date'].tzinfo > AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2351) timezone fix for @once DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2351?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448236#comment-16448236 ] ASF subversion and git services commented on AIRFLOW-2351: -- Commit 1d3bb5470711a935b36b6a0ab4c7ec414d460d75 in incubator-airflow's branch refs/heads/master from [~bolke] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=1d3bb54 ] [AIRFLOW-2351] Check for valid default_args start_date A bug existed when default_args did contain start_date but it was set to None, failing to instantiate the DAG. Closes #3256 from bolkedebruin/AIRFLOW-2351 > timezone fix for @once DAGs > --- > > Key: AIRFLOW-2351 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2351 > Project: Apache Airflow > Issue Type: Bug > Components: core, scheduler >Affects Versions: Airflow 2.0, 1.10.0, 1.9.1, 2.0.0 >Reporter: Ruslan Dautkhanov >Assignee: Ruslan Dautkhanov >Priority: Major > Fix For: 2.0.0 > > > As discussed on the dev list: > > {quote} > Upgraded Airflow .. getting following error [1] when processing a DAG > We have 'start_date': None set in default_args.. but this used to work im > previous airflow versions. > This is a '@once DAG.. so we don't need a start_date (no back fill). > {quote} > > {quote}[2018-01-16 16:05:25,283] \{models.py:293} ERROR - Failed to import: > /home/rdautkha/airflow/dags/discover/discover-ora-load-2.py > Traceback (most recent call last): > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 290, in process_file > m = imp.load_source(mod_name, filepath) > File "/home/rdautkha/airflow/dags/discover/discover-ora-load-2.py", line > 66, in > orientation = 'TB', # default > graph view > File "/opt/airflow/airflow-20180116/src/apache-airflow/airflow/models.py", > line 2951, in __init__ > self.timezone = self.default_args['start_date'].tzinfo > AttributeError: 'NoneType' object has no attribute 'tzinfo'{quote} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2351] Check for valid default_args start_date
Repository: incubator-airflow Updated Branches: refs/heads/master 6da88bb42 -> 1d3bb5470 [AIRFLOW-2351] Check for valid default_args start_date A bug existed when default_args did contain start_date but it was set to None, failing to instantiate the DAG. Closes #3256 from bolkedebruin/AIRFLOW-2351 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1d3bb547 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1d3bb547 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1d3bb547 Branch: refs/heads/master Commit: 1d3bb5470711a935b36b6a0ab4c7ec414d460d75 Parents: 6da88bb Author: Bolke de BruinAuthored: Mon Apr 23 16:37:55 2018 +0200 Committer: Fokko Driesprong Committed: Mon Apr 23 16:37:55 2018 +0200 -- airflow/models.py | 6 +++--- tests/models.py | 12 ++-- 2 files changed, 13 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1d3bb547/airflow/models.py -- diff --git a/airflow/models.py b/airflow/models.py index 18e9e26..c8a737c 100755 --- a/airflow/models.py +++ b/airflow/models.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -3094,7 +3094,7 @@ class DAG(BaseDag, LoggingMixin): # set timezone if start_date and start_date.tzinfo: self.timezone = start_date.tzinfo -elif 'start_date' in self.default_args: +elif 'start_date' in self.default_args and self.default_args['start_date']: if isinstance(self.default_args['start_date'], six.string_types): self.default_args['start_date'] = ( timezone.parse(self.default_args['start_date']) http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/1d3bb547/tests/models.py -- diff --git a/tests/models.py b/tests/models.py index 2dbc143..c2e54e5 100644 --- a/tests/models.py +++ b/tests/models.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -212,6 +212,14 @@ class DagTest(unittest.TestCase): self.assertEquals(tuple(), dag.topological_sort()) +def test_dag_none_default_args_start_date(self): +""" +Tests if a start_date of None in default_args +works. +""" +dag = DAG('DAG', default_args={'start_date': None}) +self.assertEqual(dag.timezone, settings.TIMEZONE) + def test_dag_task_priority_weight_total(self): width = 5 depth = 5
[jira] [Commented] (AIRFLOW-1433) Convert Airflow to Use FAB Framework
[ https://issues.apache.org/jira/browse/AIRFLOW-1433?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16448062#comment-16448062 ] ASF subversion and git services commented on AIRFLOW-1433: -- Commit 6da88bb4206fbb52b2bcb592a1826faf8860738d in incubator-airflow's branch refs/heads/master from [~wrp] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=6da88bb ] [AIRFLOW-1433] Set default rbac to initdb 05e1861e24de42f9a2c649cd93041c5c744504e1 breaks the api for any program directly importing airflow.utils. This sets a reasonable default. Closes #3240 from wrp/initdb > Convert Airflow to Use FAB Framework > > > Key: AIRFLOW-1433 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1433 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Joy Gao >Assignee: Joy Gao >Priority: Major > > The authentication capabilities in the RBAC design proposal introduces a > significant amount of work that is otherwise already built-in in existing > frameworks. > Per community discussion, Flask-AppBuilder (FAB) is the best fit for Airflow > as a foundation to implementing RBAC. This will support integration with > different authentication backends out-of-the-box, and generate permissions > for views and ORM models that will simplify view-level and dag-level access > control. > This implies modifying the current flask views, and deprecating the current > Flask-Admin in favor of FAB's crud. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-1433] Set default rbac to initdb
Repository: incubator-airflow Updated Branches: refs/heads/master a704b541f -> 6da88bb42 [AIRFLOW-1433] Set default rbac to initdb 05e1861e24de42f9a2c649cd93041c5c744504e1 breaks the api for any program directly importing airflow.utils. This sets a reasonable default. Closes #3240 from wrp/initdb Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/6da88bb4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/6da88bb4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/6da88bb4 Branch: refs/heads/master Commit: 6da88bb4206fbb52b2bcb592a1826faf8860738d Parents: a704b54 Author: William PursellAuthored: Mon Apr 23 14:35:43 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 14:35:43 2018 +0200 -- airflow/utils/db.py | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/6da88bb4/airflow/utils/db.py -- diff --git a/airflow/utils/db.py b/airflow/utils/db.py index c2c3d17..3ec6069 100644 --- a/airflow/utils/db.py +++ b/airflow/utils/db.py @@ -85,7 +85,7 @@ def merge_conn(conn, session=None): session.commit() -def initdb(rbac): +def initdb(rbac=False): session = settings.Session() from airflow import models
[jira] [Created] (AIRFLOW-2361) Allow GoogleCloudStorageToGoogleCloudStorageOperator to store list of copied files to XCom
Berislav Lopac created AIRFLOW-2361: --- Summary: Allow GoogleCloudStorageToGoogleCloudStorageOperator to store list of copied files to XCom Key: AIRFLOW-2361 URL: https://issues.apache.org/jira/browse/AIRFLOW-2361 Project: Apache Airflow Issue Type: Improvement Reporter: Berislav Lopac Assignee: Berislav Lopac When {{GoogleCloudStorageToGoogleCloudStorageOperator}} is used with a wildcard, it can copy more than one file. It would be useful if there would exist a mechanism to store the list of copied files as XCom so it can be used by other tasks downstream. Proposed solution: Add a {{xcom_push}} flag argument to the constructor; if {{True}}, the {{execute }}method returns a report on copied files. The report is a dict of following structure: {code:java} { "source_bucket": "source-bucket-name", "destination_bucket": "destination-bucket-name", "copied_files": [ ["original/file/path.ext", "target/prefix/original/file/path.ext"], ... ] }{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-843) Store task exceptions in context
[ https://issues.apache.org/jira/browse/AIRFLOW-843?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447904#comment-16447904 ] Andrew Jones commented on AIRFLOW-843: -- It would be great to see this merged and released > Store task exceptions in context > > > Key: AIRFLOW-843 > URL: https://issues.apache.org/jira/browse/AIRFLOW-843 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Scott Kruger >Priority: Minor > > If a task encounters an exception during execution, it should store the > exception on the execution context so that other methods (namely > `on_failure_callback` can access it. This would help with custom error > integrations, e.g. Sentry. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2270) Subdag backfill spins on removed tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Bolke de Bruin resolved AIRFLOW-2270. - Resolution: Fixed Fix Version/s: 1.10.0 > Subdag backfill spins on removed tasks > -- > > Key: AIRFLOW-2270 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2270 > Project: Apache Airflow > Issue Type: Bug >Reporter: Winston Huang >Priority: Major > Fix For: 1.10.0 > > > My understanding is that subdag operators execute via a backfill job which > runs in a loop, maintaining the state of the associated tasks and breaking > only once all pending tasks have been exhausted: > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159] > > The issue is that this task instance status is initialized by this method > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,] > which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were > previously instantiated in the database but removed from the dag definition. > Hence, the task will be missing from this list > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168] > but will exist in {{ti_status.to_run}}. This causes the backfill job to loop > indefinitely, since it considers those removed tasks to be pending but > doesn't attempt to run them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2270) Subdag backfill spins on removed tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-2270?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447899#comment-16447899 ] ASF subversion and git services commented on AIRFLOW-2270: -- Commit a704b541fea6343e5d0a17828c5746287a8dd316 in incubator-airflow's branch refs/heads/master from [~ji-han] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=a704b54 ] [AIRFLOW-2270] Handle removed tasks in backfill Fix issue with backfill jobs of dags, where tasks in the removed state are not run but still considered to be pending, causing an indefinite loop. Closes #3176 from ji-han/AIRFLOW- 2270_dag_backfill_removed_tasks > Subdag backfill spins on removed tasks > -- > > Key: AIRFLOW-2270 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2270 > Project: Apache Airflow > Issue Type: Bug >Reporter: Winston Huang >Priority: Major > > My understanding is that subdag operators execute via a backfill job which > runs in a loop, maintaining the state of the associated tasks and breaking > only once all pending tasks have been exhausted: > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2159] > > The issue is that this task instance status is initialized by this method > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2075,] > which may include tasks with {{state = State.REMOVED}}, i.e. tasks that were > previously instantiated in the database but removed from the dag definition. > Hence, the task will be missing from this list > [https://github.com/apache/incubator-airflow/blob/64206615a790c90893d5836da8d2f7159bda23ac/airflow/jobs.py#L2168] > but will exist in {{ti_status.to_run}}. This causes the backfill job to loop > indefinitely, since it considers those removed tasks to be pending but > doesn't attempt to run them. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2270] Handle removed tasks in backfill
Repository: incubator-airflow Updated Branches: refs/heads/master 0d199e5f3 -> a704b541f [AIRFLOW-2270] Handle removed tasks in backfill Fix issue with backfill jobs of dags, where tasks in the removed state are not run but still considered to be pending, causing an indefinite loop. Closes #3176 from ji-han/AIRFLOW- 2270_dag_backfill_removed_tasks Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/a704b541 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/a704b541 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/a704b541 Branch: refs/heads/master Commit: a704b541fea6343e5d0a17828c5746287a8dd316 Parents: 0d199e5 Author: Winston HuangAuthored: Mon Apr 23 12:14:52 2018 +0200 Committer: Bolke de Bruin Committed: Mon Apr 23 12:14:52 2018 +0200 -- airflow/jobs.py | 3 ++- tests/jobs.py | 44 2 files changed, 46 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a704b541/airflow/jobs.py -- diff --git a/airflow/jobs.py b/airflow/jobs.py index fe76fbd..ecbfef8 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -2121,7 +2121,8 @@ class BackfillJob(BaseJob): # all tasks part of the backfill are scheduled to run if ti.state == State.NONE: ti.set_state(State.SCHEDULED, session=session) -tasks_to_run[ti.key] = ti +if ti.state != State.REMOVED: +tasks_to_run[ti.key] = ti return tasks_to_run http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/a704b541/tests/jobs.py -- diff --git a/tests/jobs.py b/tests/jobs.py index 9eb166b..9e07645 100644 --- a/tests/jobs.py +++ b/tests/jobs.py @@ -714,6 +714,50 @@ class BackfillJobTest(unittest.TestCase): subdag.clear() dag.clear() + +def test_backfill_execute_subdag_with_removed_task(self): +""" +Ensure that subdag operators execute properly in the case where +an associated task of the subdag has been removed from the dag +definition, but has instances in the database from previous runs. +""" +dag = self.dagbag.get_dag('example_subdag_operator') +subdag = dag.get_task('section-1').subdag + +executor = TestExecutor(do_update=True) +job = BackfillJob(dag=subdag, + start_date=DEFAULT_DATE, + end_date=DEFAULT_DATE, + executor=executor, + donot_pickle=True) + +removed_task_ti = TI( +task=DummyOperator(task_id='removed_task'), +execution_date=DEFAULT_DATE, +state=State.REMOVED) +removed_task_ti.dag_id = subdag.dag_id + +session = settings.Session() +session.merge(removed_task_ti) + +with timeout(seconds=30): +job.run() + +for task in subdag.tasks: +instance = session.query(TI).filter( +TI.dag_id == subdag.dag_id, +TI.task_id == task.task_id, +TI.execution_date == DEFAULT_DATE).first() + +self.assertIsNotNone(instance) +self.assertEqual(instance.state, State.SUCCESS) + +removed_task_ti.refresh_from_db() +self.assertEqual(removed_task_ti.state, State.REMOVED) + +subdag.clear() +dag.clear() + def test_update_counters(self): dag = DAG( dag_id='test_manage_executor_state',
[jira] [Commented] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect
[ https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447808#comment-16447808 ] ASF subversion and git services commented on AIRFLOW-2344: -- Commit 0d199e5f37f076e5ff0a8e20140e35a125eedcfc in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0d199e5 ] [AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect `airflow connections -l` uses 'tabulate' package with fancy_grid format, which outputs box drawing characters. It can occur UnicodeEncodeError with pipe or redirect, since the default encoding for Python 2.x is ascii. This PR fixes it and contains some flask8 related fixes. Closes #3244 from sekikn/AIRFLOW-2344 > Fix `airflow connections -l` to work with pipe and redirect > --- > > Key: AIRFLOW-2344 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2344 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Affects Versions: 1.9.0 >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > {{airflow connections -l}} fails with pipe or redirect e.g.: > {code} > $ airflow connections -l > foo > Traceback (most recent call last): > File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in > exec(compile(open(__file__).read(), __file__, 'exec')) > File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in > > args.func(args) > File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in > wrapper > raise e > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: > ordinal not in range(128) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect
[ https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-2344. - Resolution: Fixed > Fix `airflow connections -l` to work with pipe and redirect > --- > > Key: AIRFLOW-2344 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2344 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Affects Versions: 1.9.0 >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > {{airflow connections -l}} fails with pipe or redirect e.g.: > {code} > $ airflow connections -l > foo > Traceback (most recent call last): > File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in > exec(compile(open(__file__).read(), __file__, 'exec')) > File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in > > args.func(args) > File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in > wrapper > raise e > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: > ordinal not in range(128) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect
Repository: incubator-airflow Updated Branches: refs/heads/master 49826af10 -> 0d199e5f3 [AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect `airflow connections -l` uses 'tabulate' package with fancy_grid format, which outputs box drawing characters. It can occur UnicodeEncodeError with pipe or redirect, since the default encoding for Python 2.x is ascii. This PR fixes it and contains some flask8 related fixes. Closes #3244 from sekikn/AIRFLOW-2344 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/0d199e5f Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/0d199e5f Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/0d199e5f Branch: refs/heads/master Commit: 0d199e5f37f076e5ff0a8e20140e35a125eedcfc Parents: 49826af Author: Kengo SekiAuthored: Mon Apr 23 11:12:52 2018 +0200 Committer: Fokko Driesprong Committed: Mon Apr 23 11:12:52 2018 +0200 -- airflow/bin/cli.py | 7 +-- tests/core.py | 17 +++-- 2 files changed, 16 insertions(+), 8 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0d199e5f/airflow/bin/cli.py -- diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index a5936a9..975f481 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -1025,9 +1025,12 @@ def connections(args): Connection.is_extra_encrypted, Connection.extra).all() conns = [map(reprlib.repr, conn) for conn in conns] -print(tabulate(conns, ['Conn Id', 'Conn Type', 'Host', 'Port', +msg = tabulate(conns, ['Conn Id', 'Conn Type', 'Host', 'Port', 'Is Encrypted', 'Is Extra Encrypted', 'Extra'], - tablefmt="fancy_grid")) + tablefmt="fancy_grid") +if sys.version_info[0] < 3: +msg = msg.encode('utf-8') +print(msg) return if args.delete: http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/0d199e5f/tests/core.py -- diff --git a/tests/core.py b/tests/core.py index 24c24fe..308804d 100644 --- a/tests/core.py +++ b/tests/core.py @@ -30,6 +30,7 @@ import os import re import signal import sqlalchemy +import subprocess import tempfile import warnings from datetime import timedelta @@ -1037,9 +1038,18 @@ class CliTests(unittest.TestCase): lines = [l for l in stdout.split('\n') if len(l) > 0] self.assertListEqual(lines, [ ("\tThe following args are not compatible with the " + - "--list flag: ['conn_id', 'conn_uri', 'conn_extra', 'conn_type', 'conn_host', 'conn_login', 'conn_password', 'conn_schema', 'conn_port']"), + "--list flag: ['conn_id', 'conn_uri', 'conn_extra', " + + "'conn_type', 'conn_host', 'conn_login', " + + "'conn_password', 'conn_schema', 'conn_port']"), ]) +def test_cli_connections_list_redirect(self): +cmd = ['airflow', 'connections', '--list'] +with tempfile.TemporaryFile() as fp: +p = subprocess.Popen(cmd, stdout=fp) +p.wait() +self.assertEqual(0, p.returncode) + def test_cli_connections_add_delete(self): # Add connections: uri = 'postgresql://airflow:airflow@host:5432/airflow' @@ -1415,8 +1425,6 @@ class CliTests(unittest.TestCase): sleep(1) def test_cli_webserver_foreground(self): -import subprocess - # Confirm that webserver hasn't been launched. # pgrep returns exit status 1 if no process matched. self.assertEqual(1, subprocess.Popen(["pgrep", "-c", "airflow"]).wait()) @@ -1434,8 +1442,6 @@ class CliTests(unittest.TestCase): @unittest.skipIf("TRAVIS" in os.environ and bool(os.environ["TRAVIS"]), "Skipping test due to lack of required file permission") def test_cli_webserver_foreground_with_pid(self): -import subprocess - # Run webserver in foreground with --pid option pidfile = tempfile.mkstemp()[1] p = subprocess.Popen(["airflow", "webserver", "--pid", pidfile]) @@ -1450,7 +1456,6 @@ class CliTests(unittest.TestCase): @unittest.skipIf("TRAVIS" in os.environ and bool(os.environ["TRAVIS"]), "Skipping test due to lack of required file permission") def test_cli_webserver_background(self): -import subprocess import psutil # Confirm that webserver hasn't been launched.
[jira] [Commented] (AIRFLOW-2344) Fix `airflow connections -l` to work with pipe and redirect
[ https://issues.apache.org/jira/browse/AIRFLOW-2344?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447806#comment-16447806 ] ASF subversion and git services commented on AIRFLOW-2344: -- Commit 0d199e5f37f076e5ff0a8e20140e35a125eedcfc in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=0d199e5 ] [AIRFLOW-2344] Fix `connections -l` to work with pipe/redirect `airflow connections -l` uses 'tabulate' package with fancy_grid format, which outputs box drawing characters. It can occur UnicodeEncodeError with pipe or redirect, since the default encoding for Python 2.x is ascii. This PR fixes it and contains some flask8 related fixes. Closes #3244 from sekikn/AIRFLOW-2344 > Fix `airflow connections -l` to work with pipe and redirect > --- > > Key: AIRFLOW-2344 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2344 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Affects Versions: 1.9.0 >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Minor > > {{airflow connections -l}} fails with pipe or redirect e.g.: > {code} > $ airflow connections -l > foo > Traceback (most recent call last): > File "/home/sekikn/.virtualenvs/a/bin/airflow", line 6, in > exec(compile(open(__file__).read(), __file__, 'exec')) > File "/home/sekikn/dev/incubator-airflow/airflow/bin/airflow", line 32, in > > args.func(args) > File "/home/sekikn/dev/incubator-airflow/airflow/utils/cli.py", line 77, in > wrapper > raise e > UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-141: > ordinal not in range(128) > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2360) Create user in UI broken with latest SQL Alchemy
James Gregory created AIRFLOW-2360: -- Summary: Create user in UI broken with latest SQL Alchemy Key: AIRFLOW-2360 URL: https://issues.apache.org/jira/browse/AIRFLOW-2360 Project: Apache Airflow Issue Type: Bug Components: webserver Affects Versions: 1.9.0 Reporter: James Gregory Installing airflow via pip and then clicking "Create user" in the web server UI displays a Python error traceback This is because the pip install of Airflow 1.9.0 pins flask-admin to 1.4.1, but sqlalchemy 1.1.17 and 1.1.18 break are incompatible with that version of flask admin. This is fixed in flask-admin 1.5.1, or alternatively fixed by limiting the version of SQL Alchemy. I.e. in airflow setup.py either sqlalchemy could become: 'sqlalchemy>=1.1.15, <=1.1.16', Or flask-admin could become: 'flask-admin==1.5.1' Perhaps flask-admin has been pinned to 1.4.1 to prevent new versions breaking things, in which case the change to the sqlalchemy requirement would be safer. The flask github issue is: https://github.com/flask-admin/flask-admin/issues/1588 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1819) Fix slack operator unittest bug
[ https://issues.apache.org/jira/browse/AIRFLOW-1819?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Yang resolved AIRFLOW-1819. - Resolution: Fixed > Fix slack operator unittest bug > --- > > Key: AIRFLOW-1819 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1819 > Project: Apache Airflow > Issue Type: Bug >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Major > > slack_operator.py unittest is failing and is not covering code paths for > passing in api_params. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1805) Allow to supply Slack token through connection
[ https://issues.apache.org/jira/browse/AIRFLOW-1805?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Yang resolved AIRFLOW-1805. - Resolution: Fixed > Allow to supply Slack token through connection > -- > > Key: AIRFLOW-1805 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1805 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Major > > To prevent passing in Slack token directly in plain text, it is safer to pass > in the token as 'password' through connection. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1787) Fix batch clear RUNNING task instance and inconsistent timestamp format bugs
[ https://issues.apache.org/jira/browse/AIRFLOW-1787?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Kevin Yang resolved AIRFLOW-1787. - Resolution: Fixed > Fix batch clear RUNNING task instance and inconsistent timestamp format bugs > > > Key: AIRFLOW-1787 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1787 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Reporter: Kevin Yang >Assignee: Kevin Yang >Priority: Major > Fix For: 1.10.0 > > > * Batch clear in CRUD is not working for task instances in RUNNING state, > need to be fixed > * Batch clear and set status are not working for manually triggered task > instances because manually triggered task instances have different execution > date format. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2359) Add set failed for DagRun and TaskInstance in tree view
Kevin Yang created AIRFLOW-2359: --- Summary: Add set failed for DagRun and TaskInstance in tree view Key: AIRFLOW-2359 URL: https://issues.apache.org/jira/browse/AIRFLOW-2359 Project: Apache Airflow Issue Type: Improvement Reporter: Kevin Yang Assignee: Kevin Yang User has been requesting to add set failed in tree view for DagRun and TaskInstance. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2300. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3243 [https://github.com/apache/incubator-airflow/pull/3243] > Add S3 Select functionarity to S3ToHiveTransfer > --- > > Key: AIRFLOW-2300 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2300 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, operators >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 > Select. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447644#comment-16447644 ] ASF subversion and git services commented on AIRFLOW-2300: -- Commit 49826af108d2e245ca921944296f24cc73120461 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=49826af ] [AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer To improve efficiency and usability, this PR adds S3 Select functionarity to S3ToHiveTransfer. It also contains some minor fixes for documents and comments. Closes #3243 from sekikn/AIRFLOW-2300 > Add S3 Select functionarity to S3ToHiveTransfer > --- > > Key: AIRFLOW-2300 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2300 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, operators >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 > Select. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2358) make kubernetes examples installed optionally
Ruslan Dautkhanov created AIRFLOW-2358: -- Summary: make kubernetes examples installed optionally Key: AIRFLOW-2358 URL: https://issues.apache.org/jira/browse/AIRFLOW-2358 Project: Apache Airflow Issue Type: Improvement Components: examples Affects Versions: Airflow 2.0, 1.10.0 Reporter: Ruslan Dautkhanov Is it possible to make kubernetes examples installed optionally? We don't use Kubernetes and a bare Airflow install fills logs with following : {quote}2018-04-22 19:49:04,718 ERROR - Failed to import: /opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py Traceback (most recent call last): File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/models.py", line 300, in process_file m = imp.load_source(mod_name, filepath) File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/example_dags/example_kubernetes_operator.py", line 19, in from airflow.contrib.operators.kubernetes_pod_operator import KubernetesPodOperator File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/operators/kubernetes_pod_operator.py", line 21, in from airflow.contrib.kubernetes import kube_client, pod_generator, pod_launcher File "/opt/airflow/airflow-20180420/src/apache-airflow/airflow/contrib/kubernetes/pod_launcher.py", line 25, in from kubernetes import watch {color:#f6c342}ImportError: No module named kubernetes{color}{quote} Would be great to make examples driven by what modules installed if they have external dependencies. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2300) Add S3 Select functionarity to S3ToHiveTransfer
[ https://issues.apache.org/jira/browse/AIRFLOW-2300?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16447645#comment-16447645 ] ASF subversion and git services commented on AIRFLOW-2300: -- Commit 49826af108d2e245ca921944296f24cc73120461 in incubator-airflow's branch refs/heads/master from [~sekikn] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=49826af ] [AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer To improve efficiency and usability, this PR adds S3 Select functionarity to S3ToHiveTransfer. It also contains some minor fixes for documents and comments. Closes #3243 from sekikn/AIRFLOW-2300 > Add S3 Select functionarity to S3ToHiveTransfer > --- > > Key: AIRFLOW-2300 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2300 > Project: Apache Airflow > Issue Type: Improvement > Components: aws, operators >Reporter: Kengo Seki >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > For the same reason as AIRFLOW-2299, S3ToHiveTransfer should leverage S3 > Select. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer
Repository: incubator-airflow Updated Branches: refs/heads/master a15b7c5b7 -> 49826af10 [AIRFLOW-2300] Add S3 Select functionarity to S3ToHiveTransfer To improve efficiency and usability, this PR adds S3 Select functionarity to S3ToHiveTransfer. It also contains some minor fixes for documents and comments. Closes #3243 from sekikn/AIRFLOW-2300 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/49826af1 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/49826af1 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/49826af1 Branch: refs/heads/master Commit: 49826af108d2e245ca921944296f24cc73120461 Parents: a15b7c5 Author: Kengo SekiAuthored: Mon Apr 23 08:57:23 2018 +0200 Committer: Fokko Driesprong Committed: Mon Apr 23 08:57:23 2018 +0200 -- airflow/hooks/S3_hook.py | 6 ++- airflow/operators/s3_to_hive_operator.py | 34 ++- tests/operators/s3_to_hive_operator.py | 61 --- 3 files changed, 90 insertions(+), 11 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/49826af1/airflow/hooks/S3_hook.py -- diff --git a/airflow/hooks/S3_hook.py b/airflow/hooks/S3_hook.py index 7a4b8b0..edde6ea 100644 --- a/airflow/hooks/S3_hook.py +++ b/airflow/hooks/S3_hook.py @@ -194,9 +194,11 @@ class S3Hook(AwsHook): :param expression_type: S3 Select expression type :type expression_type: str :param input_serialization: S3 Select input data serialization format -:type input_serialization: str +:type input_serialization: dict :param output_serialization: S3 Select output data serialization format -:type output_serialization: str +:type output_serialization: dict +:return: retrieved subset of original data by S3 Select +:rtype: str .. seealso:: For more details about S3 Select parameters: http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/49826af1/airflow/operators/s3_to_hive_operator.py -- diff --git a/airflow/operators/s3_to_hive_operator.py b/airflow/operators/s3_to_hive_operator.py index 129dd92..e9a979d 100644 --- a/airflow/operators/s3_to_hive_operator.py +++ b/airflow/operators/s3_to_hive_operator.py @@ -85,6 +85,8 @@ class S3ToHiveTransfer(BaseOperator): :type input_compressed: bool :param tblproperties: TBLPROPERTIES of the hive table being created :type tblproperties: dict +:param select_expression: S3 Select expression +:type select_expression: str """ template_fields = ('s3_key', 'partition', 'hive_table') @@ -108,6 +110,7 @@ class S3ToHiveTransfer(BaseOperator): hive_cli_conn_id='hive_cli_default', input_compressed=False, tblproperties=None, +select_expression=None, *args, **kwargs): super(S3ToHiveTransfer, self).__init__(*args, **kwargs) self.s3_key = s3_key @@ -124,6 +127,7 @@ class S3ToHiveTransfer(BaseOperator): self.aws_conn_id = aws_conn_id self.input_compressed = input_compressed self.tblproperties = tblproperties +self.select_expression = select_expression if (self.check_headers and not (self.field_dict is not None and self.headers)): @@ -146,16 +150,42 @@ class S3ToHiveTransfer(BaseOperator): raise AirflowException( "The key {0} does not exists".format(self.s3_key)) s3_key_object = self.s3.get_key(self.s3_key) + root, file_ext = os.path.splitext(s3_key_object.key) +if (self.select_expression and self.input_compressed and +file_ext != '.gz'): +raise AirflowException("GZIP is the only compression " + + "format Amazon S3 Select supports") + with TemporaryDirectory(prefix='tmps32hive_') as tmp_dir,\ NamedTemporaryFile(mode="wb", dir=tmp_dir, suffix=file_ext) as f: self.log.info("Dumping S3 key {0} contents to local file {1}" .format(s3_key_object.key, f.name)) -s3_key_object.download_fileobj(f) +if self.select_expression: +option = {} +if self.headers: +option['FileHeaderInfo'] = 'USE' +if self.delimiter: +option['FieldDelimiter'] = self.delimiter + +input_serialization = {'CSV': option} +