[jira] [Resolved] (AIRFLOW-2563) Pig Hook Doesn't work for Python 3
[ https://issues.apache.org/jira/browse/AIRFLOW-2563?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2563. - Resolution: Fixed Fix Version/s: 2.0.0 Fixed by PR #3594 > Pig Hook Doesn't work for Python 3 > -- > > Key: AIRFLOW-2563 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2563 > Project: Apache Airflow > Issue Type: Bug >Reporter: Murium Iqbal >Assignee: Jasper Kahn >Priority: Major > Fix For: 2.0.0 > > > Pig Hook doesn't work in Python3 due to differences in handling string and > bytes as described in this stackO post > https://stackoverflow.com/questions/50652034/pig-hook-in-airflow-doesnt-work-for-python3 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2510) Introduce new macros: prev_ds and next_ds
[ https://issues.apache.org/jira/browse/AIRFLOW-2510?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16484511#comment-16484511 ] Arthur Wiedmer commented on AIRFLOW-2510: - Have you tried using yesterday_ds and tomorrow_ds ? https://github.com/apache/incubator-airflow/blob/1f0a717b65e0ea7e0127708b084baff0697f0946/airflow/models.py#L1755 > Introduce new macros: prev_ds and next_ds > - > > Key: AIRFLOW-2510 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2510 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Chao-Han Tsai >Assignee: Chao-Han Tsai >Priority: Major > > Introduce new macros {{ prev_ds }} and {{ next_ds }}. > {{ prev_ds }}: the previous execution date as {{ -MM-DD }} > {{ next_ds }}: the next execution date as {{ -MM-DD }} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2393) UI tree view struggles with large dags (60 tasks)x25 dag histories
[ https://issues.apache.org/jira/browse/AIRFLOW-2393?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2393. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3279 [https://github.com/apache/incubator-airflow/pull/3279] > UI tree view struggles with large dags (60 tasks)x25 dag histories > -- > > Key: AIRFLOW-2393 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2393 > Project: Apache Airflow > Issue Type: Improvement > Components: ui >Affects Versions: 1.9.0 >Reporter: Badger >Priority: Major > Fix For: 2.0.0 > > > Hi, > We are noticing the tree view is taking a long time to render as our DAG has > become more complex. We will need to start breaking our dag apart in order to > continue to use the user interface. > The basic problem is that a reasonably complex DAG (60 operators) x the > standard 25 dag run histories on the tree view causes a 350MB json response > (compressed to 8MB) to be downloaded, this then needs the browser to render > it. > On quick observation this appears to be because, the response appears to > contain all meta-data for each task. > Is this something others think is a problem. We occasionally have to refresh > due to memory errors and have already increased the RAM allocated to the box. > A suggestion might be to load specific instance history when a user hovers > over the task, rather than exporting all of the history on page load. I'd > look at contributing a PR but haven't had chance to take a look at this area > of the code base. > Thanks -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2086) The tree view page is too slow when display big dag.
[ https://issues.apache.org/jira/browse/AIRFLOW-2086?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2086. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3279 [https://github.com/apache/incubator-airflow/pull/3279] > The tree view page is too slow when display big dag. > > > Key: AIRFLOW-2086 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2086 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Lintao LI >Priority: Major > Fix For: 2.0.0 > > > The tree view page is too slow for big(actually not too big) dag. > The page size will increase dramatically to hundreds of MB. > please refer to > [here|https://stackoverflow.com/questions/48656221/apache-airflow-webui-tree-view-is-too-slow] > for details. > I think the page contains a lot of redundant data. it's a bug or a flaw of > design. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2385) Airflow task is not stopped when execution timeout gets triggered
[ https://issues.apache.org/jira/browse/AIRFLOW-2385?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16455717#comment-16455717 ] Arthur Wiedmer commented on AIRFLOW-2385: - Hi Yohei, Unless I am mistaken, it looks like your operator is executing a Spark Job (I seem to recognize the progress bar from the logs.). execution_timeout will only a raise an exception in the Python process, but it might not kill the job. You probably want to implement the on_kill method for your operator so that it knows how to clean up your process. It has been implemented in a few operators already in the code base. Good luck! > Airflow task is not stopped when execution timeout gets triggered > - > > Key: AIRFLOW-2385 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2385 > Project: Apache Airflow > Issue Type: Bug > Components: DAG >Affects Versions: 1.9.0 >Reporter: Yohei Onishi >Priority: Major > > I have my own custom operator extends BaseOperator as follows. I tried to > kill a task if the task runs for more than 30 minutes. timeout seems to be > triggered according to a log but the task still continued. > Am I missing something? I checked the official document but do not know what > is wrong.[https://airflow.apache.org/code.html#baseoperator] > My operator is like as follows. > {code:java} > class MyOperator(BaseOperator): > @apply_defaults > def __init__( > self, > some_parameters_here, > *args, > **kwargs): > super(MyOperator, self).__init__(*args, **kwargs) > # some initialization here > def execute(self, context): > # some code here > {code} > > {{}}My task is like as follows. > {code:java} > t = MyOperator( > task_id='task', > dag=scheduled_dag, > execution_timeout=timedelta(minutes=30) > {code} > > I found this error but the task continued. > {code:java} > [2018-04-12 03:30:28,353] {base_task_runner.py:98} INFO - Subtask: [Stage > 6:==(1380 + -160) / > 1224][2018-04- 12 03:30:28,353] {timeout.py:36} ERROR - Process timed out > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2380) Add support for environment variables in Spark submit operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2380?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2380. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3268 [https://github.com/apache/incubator-airflow/pull/3268] > Add support for environment variables in Spark submit operator > -- > > Key: AIRFLOW-2380 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2380 > Project: Apache Airflow > Issue Type: Improvement > Components: contrib, operators >Reporter: Cristòfol Torrens >Assignee: Cristòfol Torrens >Priority: Minor > Fix For: 1.10.0 > > > Add support for environment variables in Spark submit operator. > For example, to pass the *HADOOP_CONF_DIR* in case of use same Spark cluster > with multiple HDFS. > The idea is to pass as a dict, and resolve it in the case of using > *yarn-*_client/cluster_*,* and *standalone-*_client_ mode. > In *standalone-*_cluster_ mode is not possible to do this. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-74) SubdagOperators can consume all celeryd worker processes
[ https://issues.apache.org/jira/browse/AIRFLOW-74?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-74. --- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3251 [https://github.com/apache/incubator-airflow/pull/3251] > SubdagOperators can consume all celeryd worker processes > > > Key: AIRFLOW-74 > URL: https://issues.apache.org/jira/browse/AIRFLOW-74 > Project: Apache Airflow > Issue Type: Bug > Components: celery >Affects Versions: Airflow 1.7.1, Airflow 1.7.0, Airflow 1.6.2 > Environment: Airflow 1.7.1rc3 with CeleryExecutor > 1 webserver > 1 scheduler > 2 workers >Reporter: Steven Yvinec-Kruyk >Assignee: zgl >Priority: Major > Fix For: 1.10.0 > > > If the amount of concurrent ```SubdagOperator``` running >= the no. of celery > worker processes tasks are unable to work. All SDOs come to a complete halt. > Futhermore performance of a DAG is drastically reduced even before full > saturation of the workers as less workers are gradually available for actual > tasks. A workaround for this is to specify ```SequentialExecutor``` be used > by the ```SubdagOperator``` > ``` > from datetime import timedelta, datetime > from airflow.models import DAG, Pool > from airflow.operators import BashOperator, SubDagOperator, DummyOperator > from airflow.executors import SequentialExecutor > import airflow > # -\ > # DEFINE THE POOLS > # -/ > session = airflow.settings.Session() > for p in ['test_pool_1', 'test_pool_2', 'test_pool_3']: > pool = ( > session.query(Pool) > .filter(Pool.pool == p) > .first()) > if not pool: > session.add(Pool(pool=p, slots=8)) > session.commit() > # -\ > # DEFINE THE DAG > # -/ > # Define the Dag Name. This must be unique. > dag_name = 'hanging_subdags_n16_sqe' > # Default args are passed to each task > default_args = { > 'owner': 'Airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 04, 10), > 'retries': 0, > 'retry_interval': timedelta(minutes=5), > 'email': ['y...@email.com'], > 'email_on_failure': True, > 'email_on_retry': True, > 'wait_for_downstream': False, > } > # Create the dag object > dag = DAG(dag_name, > default_args=default_args, > schedule_interval='0 0 * * *' > ) > # -\ > # DEFINE THE TASKS > # -/ > def get_subdag(dag, sd_id, pool=None): > subdag = DAG( > dag_id='{parent_dag}.{sd_id}'.format( > parent_dag=dag.dag_id, > sd_id=sd_id), > params=dag.params, > default_args=dag.default_args, > template_searchpath=dag.template_searchpath, > user_defined_macros=dag.user_defined_macros, > ) > t1 = BashOperator( > task_id='{sd_id}_step_1'.format( > sd_id=sd_id > ), > bash_command='echo "hello" && sleep 60', > dag=subdag, > pool=pool, > executor=SequentialExecutor > ) > t2 = BashOperator( > task_id='{sd_id}_step_two'.format( > sd_id=sd_id > ), > bash_command='echo "hello" && sleep 15', > dag=subdag, > pool=pool, > executor=SequentialExecutor > ) > t2.set_upstream(t1) > sdo = SubDagOperator( > task_id=sd_id, > subdag=subdag, > retries=0, > retry_delay=timedelta(seconds=5), > dag=dag, > depends_on_past=True, > ) > return sdo > start_task = DummyOperator( > task_id='start', > dag=dag > ) > for n in range(1, 17): > sd_i = get_subdag(dag=dag, sd_id='level_1_{n}'.format(n=n), > pool='test_pool_1') > sd_ii = get_subdag(dag=dag, sd_id='level_2_{n}'.format(n=n), > pool='test_pool_2') > sd_iii = get_subdag(dag=dag, sd_id='level_3_{n}'.format(n=n), > pool='test_pool_3') > sd_i.set_upstream(start_task) > sd_ii.set_upstream(sd_i) > sd_iii.set_upstream(sd_ii) > ``` -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-2365) Fix autocommit test issue with SQLite
[ https://issues.apache.org/jira/browse/AIRFLOW-2365?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer reassigned AIRFLOW-2365: --- Assignee: Arthur Wiedmer > Fix autocommit test issue with SQLite > - > > Key: AIRFLOW-2365 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2365 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Major > > In a previous PR, I added acheck for an autocommit attribute which fails for > SQLite. Correcting the tests now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2365) Fix autocommit test issue with SQLite
Arthur Wiedmer created AIRFLOW-2365: --- Summary: Fix autocommit test issue with SQLite Key: AIRFLOW-2365 URL: https://issues.apache.org/jira/browse/AIRFLOW-2365 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer In a previous PR, I added acheck for an autocommit attribute which fails for SQLite. Correcting the tests now. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2364) The autocommit flag can be set on a connection which does not support it.
Arthur Wiedmer created AIRFLOW-2364: --- Summary: The autocommit flag can be set on a connection which does not support it. Key: AIRFLOW-2364 URL: https://issues.apache.org/jira/browse/AIRFLOW-2364 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer We could just add a logging warning when the method is invoked. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2240) Add TLS/SSL to Dask Executor
[ https://issues.apache.org/jira/browse/AIRFLOW-2240?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2240. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #2683 [https://github.com/apache/incubator-airflow/pull/2683] > Add TLS/SSL to Dask Executor > > > Key: AIRFLOW-2240 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2240 > Project: Apache Airflow > Issue Type: Improvement > Components: executor >Affects Versions: Airflow 1.8 >Reporter: Marius Van Niekerk >Assignee: Marius Van Niekerk >Priority: Minor > Fix For: 2.0.0 > > > As of distributed 0.17 dask distributed supports tls / ssl for communication. > > We should allow this configuration to be used with airflow. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2335) Issue downloading oracle jdk8 is preventing travis builds from running
[ https://issues.apache.org/jira/browse/AIRFLOW-2335?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-2335. - Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3236 [https://github.com/apache/incubator-airflow/pull/3236] > Issue downloading oracle jdk8 is preventing travis builds from running > -- > > Key: AIRFLOW-2335 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2335 > Project: Apache Airflow > Issue Type: Bug >Reporter: Daniel Imberman >Assignee: Daniel Imberman >Priority: Major > Fix For: 1.10.0 > > > Currently, all airflow build are dying after ~1 minute due to an issue with > how travis pulls jdk8 -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-1235) Odd behaviour when all gunicorn workers die
[ https://issues.apache.org/jira/browse/AIRFLOW-1235?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1235. - Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #2330 [https://github.com/apache/incubator-airflow/pull/2330] > Odd behaviour when all gunicorn workers die > --- > > Key: AIRFLOW-1235 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1235 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.8.0 >Reporter: Erik Forsberg >Assignee: Kengo Seki >Priority: Major > Fix For: 2.0.0 > > > The webserver has sometimes stopped responding to port 443, and today I found > the issue - I had a misconfigured resolv.conf that made it unable to talk to > my postgresql. This was the root cause, but the way airflow webserver behaved > was a bit odd. > It seems that when all gunicorn workers failed to start, the gunicorn master > shut down. However, the main process (the one that starts gunicorn master) > did not shut down, so there was no way of detecting the failed status of > webserver from e.g. systemd or init script. > Full traceback leading to stale webserver process: > {noformat} > May 21 09:51:57 airmaster01 airflow[26451]: [2017-05-21 09:51:57 +] > [23794] [ERROR] Exception in worker process: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1122, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._pool.get(wait, > self._timeout) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/queue.py", > line 145, in get > May 21 09:51:57 airmaster01 airflow[26451]: raise Empty > May 21 09:51:57 airmaster01 airflow[26451]: sqlalchemy.util.queue.Empty > May 21 09:51:57 airmaster01 airflow[26451]: During handling of the above > exception, another exception occurred: > May 21 09:51:57 airmaster01 airflow[26451]: Traceback (most recent call last): > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/engine/base.py", > line 2147, in _wrap_pool_connect > May 21 09:51:57 airmaster01 airflow[26451]: return fn() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 387, in connect > May 21 09:51:57 airmaster01 airflow[26451]: return > _ConnectionFairy._checkout(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 766, in _checkout > May 21 09:51:57 airmaster01 airflow[26451]: fairy = > _ConnectionRecord.checkout(pool) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 516, in checkout > May 21 09:51:57 airmaster01 airflow[26451]: rec = pool._do_get() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1138, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: self._dec_overflow() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/langhelpers.py", > line 66, in __exit__ > May 21 09:51:57 airmaster01 airflow[26451]: compat.reraise(exc_type, > exc_value, exc_tb) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/util/compat.py", > line 187, in reraise > May 21 09:51:57 airmaster01 airflow[26451]: raise value > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 1135, in _do_get > May 21 09:51:57 airmaster01 airflow[26451]: return self._create_connection() > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 333, in _create_connection > May 21 09:51:57 airmaster01 airflow[26451]: return _ConnectionRecord(self) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 461, in __init__ > May 21 09:51:57 airmaster01 airflow[26451]: > self.__connect(first_connect_check=True) > May 21 09:51:57 airmaster01 airflow[26451]: File > "/opt/airflow/production/lib/python3.4/site-packages/sqlalchemy/pool.py", > line 651, in __connect > May 21 09:51:57 airmaster01 airflow[26451]: connection = > pool._invoke_creator(self) > May 21 09:51:57 airmaster01 airflow[26451]: File >
[jira] [Commented] (AIRFLOW-1165) airflow webservice crashes on ubuntu16 - python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16080698#comment-16080698 ] Arthur Wiedmer commented on AIRFLOW-1165: - A short fix until the version is upgraded can be the following At the prompt # Generating an RSA public/private-key pair openssl genrsa -out private.pem 2048 # Generating a self-signed certificate openssl req -new -x509 -key private.pem -out cacert.pem -days 1095 # In your airflow.cfg under [webserver] web_server_ssl_cert = path/to/cacert.pem web_server_ssl_key = path/to/private.pem > airflow webservice crashes on ubuntu16 - python3 > - > > Key: AIRFLOW-1165 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1165 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hamed >Assignee: Arthur Wiedmer > Fix For: 1.9.0 > > > I am trying to run airflow webserver on ubuntu16, python3 and ran to this > issue. Any idea? > {code} > [2017-05-02 16:36:34,789] [24096] {_internal.py:87} WARNING - * Debugger is > active! > [2017-05-02 16:36:34,790] [24096] {_internal.py:87} INFO - * Debugger PIN: > 294-518-137 > Exception in thread Thread-1: > Traceback (most recent call last): > File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner > self.run() > File "/usr/lib/python3.5/threading.py", line 862, in run > self._target(*self._args, **self._kwargs) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 696, in inner > fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 590, in make_server > passthrough_errors, ssl_context, fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 525, in __init__ > self.socket = ssl_context.wrap_socket(sock, server_side=True) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 447, in wrap_socket > ssl_version=self._protocol, **kwargs) > File "/usr/lib/python3.5/ssl.py", line 1069, in wrap_socket > ciphers=ciphers) > File "/usr/lib/python3.5/ssl.py", line 680, in __init__ > raise ValueError("certfile must be specified for server-side " > ValueError: certfile must be specified for server-side operations > {code} -- This message was sent by Atlassian JIRA (v6.4.14#64029)
[jira] [Assigned] (AIRFLOW-1165) airflow webservice crashes on ubuntu16 - python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer reassigned AIRFLOW-1165: --- Assignee: Arthur Wiedmer > airflow webservice crashes on ubuntu16 - python3 > - > > Key: AIRFLOW-1165 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1165 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hamed >Assignee: Arthur Wiedmer > Fix For: 1.8.1 > > > I am trying to run airflow webserver on ubuntu16, python3 and ran to this > issue. Any idea? > {code} > [2017-05-02 16:36:34,789] [24096] {_internal.py:87} WARNING - * Debugger is > active! > [2017-05-02 16:36:34,790] [24096] {_internal.py:87} INFO - * Debugger PIN: > 294-518-137 > Exception in thread Thread-1: > Traceback (most recent call last): > File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner > self.run() > File "/usr/lib/python3.5/threading.py", line 862, in run > self._target(*self._args, **self._kwargs) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 696, in inner > fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 590, in make_server > passthrough_errors, ssl_context, fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 525, in __init__ > self.socket = ssl_context.wrap_socket(sock, server_side=True) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 447, in wrap_socket > ssl_version=self._protocol, **kwargs) > File "/usr/lib/python3.5/ssl.py", line 1069, in wrap_socket > ciphers=ciphers) > File "/usr/lib/python3.5/ssl.py", line 680, in __init__ > raise ValueError("certfile must be specified for server-side " > ValueError: certfile must be specified for server-side operations > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1165) airflow webservice crashes on ubuntu16 - python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1165. - Resolution: Fixed Fix Version/s: 1.8.1 Resolved in master and the fix is in the current RC being voted on. > airflow webservice crashes on ubuntu16 - python3 > - > > Key: AIRFLOW-1165 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1165 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hamed > Fix For: 1.8.1 > > > I am trying to run airflow webserver on ubuntu16, python3 and ran to this > issue. Any idea? > {code} > [2017-05-02 16:36:34,789] [24096] {_internal.py:87} WARNING - * Debugger is > active! > [2017-05-02 16:36:34,790] [24096] {_internal.py:87} INFO - * Debugger PIN: > 294-518-137 > Exception in thread Thread-1: > Traceback (most recent call last): > File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner > self.run() > File "/usr/lib/python3.5/threading.py", line 862, in run > self._target(*self._args, **self._kwargs) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 696, in inner > fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 590, in make_server > passthrough_errors, ssl_context, fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 525, in __init__ > self.socket = ssl_context.wrap_socket(sock, server_side=True) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 447, in wrap_socket > ssl_version=self._protocol, **kwargs) > File "/usr/lib/python3.5/ssl.py", line 1069, in wrap_socket > ciphers=ciphers) > File "/usr/lib/python3.5/ssl.py", line 680, in __init__ > raise ValueError("certfile must be specified for server-side " > ValueError: certfile must be specified for server-side operations > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1165) airflow webservice crashes on ubuntu16 - python3
[ https://issues.apache.org/jira/browse/AIRFLOW-1165?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15993681#comment-15993681 ] Arthur Wiedmer commented on AIRFLOW-1165: - This is a duplicate of https://issues.apache.org/jira/browse/AIRFLOW-832 It is fixed in the current master, and will be fixed in the next release. The short term fix is the commands outlined here: http://stackoverflow.com/a/40857607 > airflow webservice crashes on ubuntu16 - python3 > - > > Key: AIRFLOW-1165 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1165 > Project: Apache Airflow > Issue Type: Bug >Reporter: Hamed > > I am trying to run airflow webserver on ubuntu16, python3 and ran to this > issue. Any idea? > {code} > [2017-05-02 16:36:34,789] [24096] {_internal.py:87} WARNING - * Debugger is > active! > [2017-05-02 16:36:34,790] [24096] {_internal.py:87} INFO - * Debugger PIN: > 294-518-137 > Exception in thread Thread-1: > Traceback (most recent call last): > File "/usr/lib/python3.5/threading.py", line 914, in _bootstrap_inner > self.run() > File "/usr/lib/python3.5/threading.py", line 862, in run > self._target(*self._args, **self._kwargs) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 696, in inner > fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 590, in make_server > passthrough_errors, ssl_context, fd=fd) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 525, in __init__ > self.socket = ssl_context.wrap_socket(sock, server_side=True) > File "/usr/local/lib/python3.5/dist-packages/werkzeug/serving.py", line > 447, in wrap_socket > ssl_version=self._protocol, **kwargs) > File "/usr/lib/python3.5/ssl.py", line 1069, in wrap_socket > ciphers=ciphers) > File "/usr/lib/python3.5/ssl.py", line 680, in __init__ > raise ValueError("certfile must be specified for server-side " > ValueError: certfile must be specified for server-side operations > {code} -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1028) Databricks Operator for Airflow
[ https://issues.apache.org/jira/browse/AIRFLOW-1028?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1028. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2202 [https://github.com/apache/incubator-airflow/pull/2202] > Databricks Operator for Airflow > --- > > Key: AIRFLOW-1028 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1028 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Andrew Chen >Assignee: Andrew Chen > Fix For: 1.9.0 > > > It would be nice to have a Databricks Operator/Hook in Airflow so users of > Databricks can more easily integrate with Airflow. > The operator would submit a spark job to our new /jobs/runs/submit endpoint. > This endpoint is similar to > https://docs.databricks.com/api/latest/jobs.html#jobscreatejob but does not > include the email_notifications, max_retries, min_retry_interval_millis, > retry_on_timeout, schedule, max_concurrent_runs fields. (The submit docs are > not out because it's still a private endpoint.) > Our proposed design for the operator then is to match this REST API endpoint. > Each argument to the parameter is named to be one of the fields of the REST > API request and the value of the argument will match the type expected by the > REST API. We will also merge extra keys from kwargs which should not be > passed to the BaseOperator into our API call in order to be flexible to > updates. > In the case that this interface is not very user friendly, we can later add > more operators which extend this operator. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1016) Allow HTTP HEAD request method on HTTPSensor
[ https://issues.apache.org/jira/browse/AIRFLOW-1016?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1016. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2175 [https://github.com/apache/incubator-airflow/pull/2175] > Allow HTTP HEAD request method on HTTPSensor > > > Key: AIRFLOW-1016 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1016 > Project: Apache Airflow > Issue Type: Improvement >Reporter: msempere >Assignee: msempere >Priority: Minor > Labels: features > Fix For: 1.9.0 > > > HTTPSensor hardcodes the HTTP request method to `GET`, and could be the case > where `HEAD` method is needed to act as a sensor. > This case is useful when we just need to retrieve some meta data and not the > complete body for that particular request, and that metadata information is > enough for our sensor. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-947) Make PrestoHook surface better messages when the Presto Cluster is unavailable.
[ https://issues.apache.org/jira/browse/AIRFLOW-947?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-947. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2128 [https://github.com/apache/incubator-airflow/pull/2128] > Make PrestoHook surface better messages when the Presto Cluster is > unavailable. > --- > > Key: AIRFLOW-947 > URL: https://issues.apache.org/jira/browse/AIRFLOW-947 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > Fix For: 1.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-1067) Should not use airf...@airflow.com in examples
[ https://issues.apache.org/jira/browse/AIRFLOW-1067?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15955329#comment-15955329 ] Arthur Wiedmer commented on AIRFLOW-1067: - Duplicate of https://issues.apache.org/jira/browse/AIRFLOW-1066 We had the same idea. > Should not use airf...@airflow.com in examples > -- > > Key: AIRFLOW-1067 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1067 > Project: Apache Airflow > Issue Type: Bug >Reporter: Xiangrui Meng >Assignee: Xiangrui Meng >Priority: Minor > > airflow.com is owned by a company named Airflow (selling fans, etc). We > should use airf...@example.com in all examples. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-1066) Replace instances of airf...@airflow.com with airf...@example.com
Arthur Wiedmer created AIRFLOW-1066: --- Summary: Replace instances of airf...@airflow.com with airf...@example.com Key: AIRFLOW-1066 URL: https://issues.apache.org/jira/browse/AIRFLOW-1066 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Trivial airflow.com is a registered website to a company selling fans :) We can use example.com as a domain name. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1038) Specify celery serializers explicitly and pin version
[ https://issues.apache.org/jira/browse/AIRFLOW-1038?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1038. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2185 [https://github.com/apache/incubator-airflow/pull/2185] > Specify celery serializers explicitly and pin version > - > > Key: AIRFLOW-1038 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1038 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Celery 3->4 upgrade changes the default task and result serializer from > pickle to json. Pickle is faster and supports more types > http://docs.celeryproject.org/en/latest/userguide/calling.html > This also causes issues when different versions of celery are running on > different hosts. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-1007) Jinja sandbox is vulnerable to RCE
[ https://issues.apache.org/jira/browse/AIRFLOW-1007?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-1007. - Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2184 [https://github.com/apache/incubator-airflow/pull/2184] > Jinja sandbox is vulnerable to RCE > -- > > Key: AIRFLOW-1007 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1007 > Project: Apache Airflow > Issue Type: Bug >Reporter: Alex Guziel >Assignee: Alex Guziel > Fix For: 1.9.0 > > > Right now, the jinja template functionality in chart_data takes arbitrary > strings and executes them. We should use the sandbox functionality to prevent > this. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-999) Support for Redis database
[ https://issues.apache.org/jira/browse/AIRFLOW-999?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-999. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2165 [https://github.com/apache/incubator-airflow/pull/2165] > Support for Redis database > -- > > Key: AIRFLOW-999 > URL: https://issues.apache.org/jira/browse/AIRFLOW-999 > Project: Apache Airflow > Issue Type: Improvement > Components: db >Reporter: msempere >Assignee: msempere >Priority: Minor > Labels: features > Fix For: 1.9.0 > > > Currently Airflow doesn't offer support for Redis DB. > The idea is to create a Hook to connect to it and offer a minimal > functionality. > So the proposal is to create a sensor that monitor for a Redis key existence. -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-997) Change setup.cfg to point to Apache instead of Max
[ https://issues.apache.org/jira/browse/AIRFLOW-997?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-997. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2162 [https://github.com/apache/incubator-airflow/pull/2162] > Change setup.cfg to point to Apache instead of Max > -- > > Key: AIRFLOW-997 > URL: https://issues.apache.org/jira/browse/AIRFLOW-997 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > Fix For: 1.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-997) Change setup.cfg to point to Apache instead of Max
Arthur Wiedmer created AIRFLOW-997: -- Summary: Change setup.cfg to point to Apache instead of Max Key: AIRFLOW-997 URL: https://issues.apache.org/jira/browse/AIRFLOW-997 Project: Apache Airflow Issue Type: Improvement Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-960) Add support for .editorconfig
[ https://issues.apache.org/jira/browse/AIRFLOW-960?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-960. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2137 [https://github.com/apache/incubator-airflow/pull/2137] > Add support for .editorconfig > - > > Key: AIRFLOW-960 > URL: https://issues.apache.org/jira/browse/AIRFLOW-960 > Project: Apache Airflow > Issue Type: Improvement >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Trivial > Fix For: 1.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Resolved] (AIRFLOW-959) .gitignore file is disorganized and incomplete
[ https://issues.apache.org/jira/browse/AIRFLOW-959?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-959. Resolution: Fixed Fix Version/s: 1.9.0 Issue resolved by pull request #2136 [https://github.com/apache/incubator-airflow/pull/2136] > .gitignore file is disorganized and incomplete > -- > > Key: AIRFLOW-959 > URL: https://issues.apache.org/jira/browse/AIRFLOW-959 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Trivial > Fix For: 1.9.0 > > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-959) .gitignore file is disorganized and incomplete
[ https://issues.apache.org/jira/browse/AIRFLOW-959?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15903695#comment-15903695 ] Arthur Wiedmer commented on AIRFLOW-959: +1 > .gitignore file is disorganized and incomplete > -- > > Key: AIRFLOW-959 > URL: https://issues.apache.org/jira/browse/AIRFLOW-959 > Project: Apache Airflow > Issue Type: Bug >Reporter: George Leslie-Waksman >Assignee: George Leslie-Waksman >Priority: Trivial > -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-947) Make PrestoHook surface better messages when the Presto Cluster is unavailable.
Arthur Wiedmer created AIRFLOW-947: -- Summary: Make PrestoHook surface better messages when the Presto Cluster is unavailable. Key: AIRFLOW-947 URL: https://issues.apache.org/jira/browse/AIRFLOW-947 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-846) Release schedule, latest tag is too old
[ https://issues.apache.org/jira/browse/AIRFLOW-846?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15894536#comment-15894536 ] Arthur Wiedmer commented on AIRFLOW-846: Hi [~ultrabug], We are on RC5 now, and will release to PyPI once the current blockers are cleared, and a new vote on the release is taken. All of this combined might take nother couple of weeks. Best, Arthur > Release schedule, latest tag is too old > --- > > Key: AIRFLOW-846 > URL: https://issues.apache.org/jira/browse/AIRFLOW-846 > Project: Apache Airflow > Issue Type: Task >Reporter: Ultrabug >Priority: Blocker > Labels: release, tagging > > To my understanding, there is no clear point about the release schedule of > the project. > The latest tag is 1.7.1.3 from June 2016, which is not well suited for > production now days. > For example, the latest available release is still affected by AIRFLOW-178 > which means that we have to patch the sources on production to work with ZIP > files. > Could you please share your thoughts and position on the release planning of > the project ? > Would it be possible to get a newer tag sometimes soon ? > Thank you -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Commented] (AIRFLOW-916) Fix ConfigParser deprecation warning
[ https://issues.apache.org/jira/browse/AIRFLOW-916?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15891058#comment-15891058 ] Arthur Wiedmer commented on AIRFLOW-916: This was breaking things for me on 2.7.13 on a local fresh install. Let's revert. > Fix ConfigParser deprecation warning > - > > Key: AIRFLOW-916 > URL: https://issues.apache.org/jira/browse/AIRFLOW-916 > Project: Apache Airflow > Issue Type: Improvement >Affects Versions: 1.8.0 >Reporter: Jeremiah Lowin >Assignee: Jeremiah Lowin >Priority: Trivial > Fix For: 1.9.0 > > > ConfigParser.readfp() is deprecated in favor of ConfigParser.read_file(), > according to warning messages -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-885) Add Change.org to the list of Airflow users
Arthur Wiedmer created AIRFLOW-885: -- Summary: Add Change.org to the list of Airflow users Key: AIRFLOW-885 URL: https://issues.apache.org/jira/browse/AIRFLOW-885 Project: Apache Airflow Issue Type: Task Reporter: Arthur Wiedmer -- This message was sent by Atlassian JIRA (v6.3.15#6346)
[jira] [Created] (AIRFLOW-731) NamedHivePartitionSensor chokes on partition predicate with periods.
Arthur Wiedmer created AIRFLOW-731: -- Summary: NamedHivePartitionSensor chokes on partition predicate with periods. Key: AIRFLOW-731 URL: https://issues.apache.org/jira/browse/AIRFLOW-731 Project: Apache Airflow Issue Type: Bug Affects Versions: Airflow 1.7.1, Airflow 1.7.0 Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Trivial The partition parsing function did not limit splitting around the first period leading to issues. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-682) Bump MAX_PERIODS
[ https://issues.apache.org/jira/browse/AIRFLOW-682?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15729673#comment-15729673 ] Arthur Wiedmer commented on AIRFLOW-682: +1. Very useful for large-ish DAGs > 1k tasks as this limit applies also for the max number of tasks when marking upstream or downstream success. > Bump MAX_PERIODS > > > Key: AIRFLOW-682 > URL: https://issues.apache.org/jira/browse/AIRFLOW-682 > Project: Apache Airflow > Issue Type: Bug >Reporter: Dan Davydov >Assignee: Dan Davydov > > It is not possible to mark success on some large DAGs due to the MAX_PERIODS > being set to 1000. We should temporarily bump it up until work can be done to > scale the mark success endpoint much higher. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-575) Improve tutorial information about default_args
[ https://issues.apache.org/jira/browse/AIRFLOW-575?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-575. Resolution: Fixed > Improve tutorial information about default_args > --- > > Key: AIRFLOW-575 > URL: https://issues.apache.org/jira/browse/AIRFLOW-575 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Reporter: Laura Lorenz >Assignee: Laura Lorenz >Priority: Minor > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-575) Improve tutorial information about default_args
Arthur Wiedmer created AIRFLOW-575: -- Summary: Improve tutorial information about default_args Key: AIRFLOW-575 URL: https://issues.apache.org/jira/browse/AIRFLOW-575 Project: Apache Airflow Issue Type: Improvement Components: Documentation Reporter: Laura Lorenz Assignee: Laura Lorenz Priority: Minor -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-497) Release plans & info
[ https://issues.apache.org/jira/browse/AIRFLOW-497?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15478185#comment-15478185 ] Arthur Wiedmer commented on AIRFLOW-497: Hi Alexander, I think I can leave a quick update here. While the committers and various contributors have worked on several improvements, we have been blocked on navigating our first apache release (a decent amount of contributors are new to this process and it takes a little getting used to). The main issues that the next release will address are licensing issues, stripping out components that were not compatible with the Apache License as well as a few bug fixes. We hope to be able to release more often in the future once we document the release process internally and make sure we are starting with the right base to be a successful project under the Apache umbrella. A general idea of the improvement roadmap can be found on the wiki : https://cwiki.apache.org/confluence/display/AIRFLOW/Roadmap Feel free to ping the dev mailing list also if you have more questions or want to start a conversation about releases. Best, Arthur > Release plans & info > > > Key: AIRFLOW-497 > URL: https://issues.apache.org/jira/browse/AIRFLOW-497 > Project: Apache Airflow > Issue Type: Wish > Components: core, docs >Reporter: Alexander Kachkaev >Priority: Minor > Labels: build, newbie, release > > I did a couple of experiments with airflow several months ago and returned to > explore it properly this week. After a few days of quite intensive reading > and hacking it still remains unclear to me what's going on with the project > ATM. > The latest release is 1.7.1.3, which dates back to 2016-06-13 (three months > from now). The docs on pythonhosted sometimes refer to 1.8 and git blame > reveals that these mentionings have been there since at least April 2016. > JIRA's dashboard has references to versions 1.8 and 2.0, but those only > contain lists with issues - no deadline etc. > I imagine that core developers have a clear picture about the situation and > it is probably possible to figure things out from the mailing list and > gitter, However, it would be good to see roadmap etc. in a slightly more > accessible way. > More frequent releases will help a lot as well. I'm seeing some issues when > running 1.7.1.3 via docker-airflow / celery, but it's totally unclear whether > these still exist on airflow's master branch or even something's wrong with > the docker wrapper I'm using. Opening an issue in JIRA seems somewhat stupid > in this situation. > Could anyone please increase the clarity of meta? -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-323) Should be able to prevent tasks from overlapping across multiple DAG Runs
[ https://issues.apache.org/jira/browse/AIRFLOW-323?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15371291#comment-15371291 ] Arthur Wiedmer commented on AIRFLOW-323: Hi Isaac, it sounds like there are a couple of things that could help you : 1) You can set max_active_runs for the DAG to 1 to ensure that only one dag run is active at a time. In this case, only one dag run will be executed at a time. 2) You can set depend_on_past to True such that this task will not execute unless the previous one completes. 3) Finally, you can make this DAG use a pool with one slot, such that this task basically takes a lock on this particular resource. Though ideally, if several tasks are competing for the same resource, you might not want to schedule them at a cadence that will introduce contention... > Should be able to prevent tasks from overlapping across multiple DAG Runs > - > > Key: AIRFLOW-323 > URL: https://issues.apache.org/jira/browse/AIRFLOW-323 > Project: Apache Airflow > Issue Type: Bug >Affects Versions: Airflow 1.7.1.2 > Environment: 1.7.1.2 >Reporter: Isaac Steele >Assignee: Isaac Steele > > As a the Airflow administrator, > If a task from a previous DAG Run is still running when the next scheduled > run triggers the same task, there should be a way prevent the tasks from > overlapping. > Otherwise the same code could end up running multiple times simultaneously. > To reproduce: > 1) Create a DAG with a short scheduled interval > 2) Create a task in that DAG to run longer than the interval > Result: Both tasks end up running that the same time. > This can cause tasks to compete for resources as well as duplicating or > overwriting what the other task is doing. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Updated] (AIRFLOW-264) Adding support for Hive queues.
[ https://issues.apache.org/jira/browse/AIRFLOW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer updated AIRFLOW-264: --- Fix Version/s: Airflow 1.8 > Adding support for Hive queues. > --- > > Key: AIRFLOW-264 > URL: https://issues.apache.org/jira/browse/AIRFLOW-264 > Project: Apache Airflow > Issue Type: Improvement > Components: hive_hooks >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > Fix For: Airflow 1.8 > > > Hive allows for queues to be set for workload management. We have started > using them for multi-tenant management on our Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-264) Adding support for Hive queues.
[ https://issues.apache.org/jira/browse/AIRFLOW-264?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-264. Resolution: Fixed > Adding support for Hive queues. > --- > > Key: AIRFLOW-264 > URL: https://issues.apache.org/jira/browse/AIRFLOW-264 > Project: Apache Airflow > Issue Type: Improvement > Components: hive_hooks >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > > Hive allows for queues to be set for workload management. We have started > using them for multi-tenant management on our Hive cluster. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Resolved] (AIRFLOW-263) Backtick file introduced by Highcharts refactor
[ https://issues.apache.org/jira/browse/AIRFLOW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer resolved AIRFLOW-263. Resolution: Fixed > Backtick file introduced by Highcharts refactor > --- > > Key: AIRFLOW-263 > URL: https://issues.apache.org/jira/browse/AIRFLOW-263 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > > A file named "`" was introduced during the Highcharts removal. See > https://github.com/apache/incubator-airflow/commit/0a460081bc7cba2d05434148f092b87d35aa8cd3 > My best assessment, is that this was a temporary file created by mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-263) Backtick file introduced by Highcharts refactor
[ https://issues.apache.org/jira/browse/AIRFLOW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340606#comment-15340606 ] Arthur Wiedmer commented on AIRFLOW-263: PR here : https://github.com/apache/incubator-airflow/pull/1613 > Backtick file introduced by Highcharts refactor > --- > > Key: AIRFLOW-263 > URL: https://issues.apache.org/jira/browse/AIRFLOW-263 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > > A file named "`" was introduced during the Highcharts removal. See > https://github.com/apache/incubator-airflow/commit/0a460081bc7cba2d05434148f092b87d35aa8cd3 > My best assessment, is that this was a temporary file created by mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-263) Backtick file introduced by Highcharts refactor
[ https://issues.apache.org/jira/browse/AIRFLOW-263?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15340552#comment-15340552 ] Arthur Wiedmer commented on AIRFLOW-263: [~bolke], you are the best to assess if this file is needed, but it looks like a temp file to me. > Backtick file introduced by Highcharts refactor > --- > > Key: AIRFLOW-263 > URL: https://issues.apache.org/jira/browse/AIRFLOW-263 > Project: Apache Airflow > Issue Type: Bug >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > > A file named "`" was introduced during the Highcharts removal. See > https://github.com/apache/incubator-airflow/commit/0a460081bc7cba2d05434148f092b87d35aa8cd3 > My best assessment, is that this was a temporary file created by mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-263) Backtick file introduced by Highcharts refactor
Arthur Wiedmer created AIRFLOW-263: -- Summary: Backtick file introduced by Highcharts refactor Key: AIRFLOW-263 URL: https://issues.apache.org/jira/browse/AIRFLOW-263 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Minor A file named "`" was introduced during the Highcharts removal. See https://github.com/apache/incubator-airflow/commit/0a460081bc7cba2d05434148f092b87d35aa8cd3 My best assessment, is that this was a temporary file created by mistake. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-184) Add clear/mark success to CLI
[ https://issues.apache.org/jira/browse/AIRFLOW-184?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15317196#comment-15317196 ] Arthur Wiedmer commented on AIRFLOW-184: Sounds good to me. Ideally, this should need to be queued indeed. Should the commands mark_success just be a wrapper around a more general set_state? Marking large swath of tasks as success is a pain in the ui, and the backfill with regex matching was useful for this. But I agree that it does not make sense anymore and should be refactored into something more useful + that does not go through the scheduler as it is a waste of slots. > Add clear/mark success to CLI > - > > Key: AIRFLOW-184 > URL: https://issues.apache.org/jira/browse/AIRFLOW-184 > Project: Apache Airflow > Issue Type: Bug > Components: cli >Reporter: Chris Riccomini >Assignee: Joy Gao > > AIRFLOW-177 pointed out that the current CLI does not allow us to clear or > mark success a task (including upstream, downstream, past, future, and > recursive) the way that the UI widget does. Given a goal of keeping parity > between the UI and CLI, it seems like we should support this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-186) conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook
Arthur Wiedmer created AIRFLOW-186: -- Summary: conn.literal is specific to MySQLdb, and should be factored out of the dbapi_hook Key: AIRFLOW-186 URL: https://issues.apache.org/jira/browse/AIRFLOW-186 Project: Apache Airflow Issue Type: Bug Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Assigned] (AIRFLOW-115) Migrate and Refactor AWS integration to use boto3 and better structured hooks
[ https://issues.apache.org/jira/browse/AIRFLOW-115?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Arthur Wiedmer reassigned AIRFLOW-115: -- Assignee: Arthur Wiedmer > Migrate and Refactor AWS integration to use boto3 and better structured hooks > - > > Key: AIRFLOW-115 > URL: https://issues.apache.org/jira/browse/AIRFLOW-115 > Project: Apache Airflow > Issue Type: Improvement > Components: AWS, boto3, hooks >Reporter: Arthur Wiedmer >Assignee: Arthur Wiedmer >Priority: Minor > > h2. Current State > The current AWS integration is mostly done through the S3Hook, which uses non > standard credentials parsing on top of using boto instead of boto3 which is > the current supported AWS sdk for Python. > h2. Proposal > an AWSHook should be provided that maps Airflow connections to the boto3 API. > Operators working with s3, as well as other AWS services would then inherit > from this hook but extend the functionality with service specific methods > like get_key for S3, start_cluster for EMR, enqueue for SQS, send_email for > SES etc... > * AWSHook > ** S3Hook > ** EMRHook > ** SQSHook > ** SESHook > ... > -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-110) Point people to the approriate process to submit PRs in the repository's CONTRIBUTING.md
Arthur Wiedmer created AIRFLOW-110: -- Summary: Point people to the approriate process to submit PRs in the repository's CONTRIBUTING.md Key: AIRFLOW-110 URL: https://issues.apache.org/jira/browse/AIRFLOW-110 Project: Apache Airflow Issue Type: Task Components: docs Reporter: Arthur Wiedmer Priority: Trivial The current process to contribute code could be made more accessible. I am assuming that the entry point to the project is Github and the repository. We could modify the contributing.md as well as the read me to point to the proper way to do this. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Created] (AIRFLOW-109) PrestoHook get_pandas_df executes a method that can raise outside of the try catch statement.
Arthur Wiedmer created AIRFLOW-109: -- Summary: PrestoHook get_pandas_df executes a method that can raise outside of the try catch statement. Key: AIRFLOW-109 URL: https://issues.apache.org/jira/browse/AIRFLOW-109 Project: Apache Airflow Issue Type: Bug Components: hooks Affects Versions: Airflow 1.8, Airflow 1.7.1, Airflow 1.6.2 Reporter: Arthur Wiedmer Assignee: Arthur Wiedmer Priority: Minor This issue occurs when a malformed SQL statement is passed to the get_pandas_df method of the presto hook. Pyhive raises a DatabaseError outside of the try catch, leading in the wrong kind of error being raised. -- This message was sent by Atlassian JIRA (v6.3.4#6332)
[jira] [Commented] (AIRFLOW-17) Master Travis CI build is broken
[ https://issues.apache.org/jira/browse/AIRFLOW-17?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=15262573#comment-15262573 ] Arthur Wiedmer commented on AIRFLOW-17: --- Unfortunately, I have very little knowledge of how the license check actually works. The code here seems very simplistic : https://github.com/airbnb/airflow/blob/master/scripts/ci/check-license.sh#L98 Maybe we can disable this check in the case of a revert. > Master Travis CI build is broken > > > Key: AIRFLOW-17 > URL: https://issues.apache.org/jira/browse/AIRFLOW-17 > Project: Apache Airflow > Issue Type: Improvement >Reporter: Chris Riccomini > > It looks like master is broken: > https://travis-ci.org/airbnb/airflow/branches > This build seems to be the first one that broke: > https://travis-ci.org/airbnb/airflow/builds/126014622 -- This message was sent by Atlassian JIRA (v6.3.4#6332)