[jira] [Created] (AIRFLOW-2433) Operator with TriggerRule "one_success" with multiple upstream tasks is marked as Skipped instead of UpstreamFailed if all its upstream tasks are in "UpstreamFailed" st
Vigneshwaran Raveendran created AIRFLOW-2433: Summary: Operator with TriggerRule "one_success" with multiple upstream tasks is marked as Skipped instead of UpstreamFailed if all its upstream tasks are in "UpstreamFailed" status Key: AIRFLOW-2433 URL: https://issues.apache.org/jira/browse/AIRFLOW-2433 Project: Apache Airflow Issue Type: Bug Components: operators Affects Versions: 1.9.0 Reporter: Vigneshwaran Raveendran Attachments: Airflow_1.9_incorrect_state_issue.png I have a task with trigger_rule "one_success" with two upstream tasks. When all its upstream tasks are in UpstreamFailed, the task and all its downstream tasks are marked as "Skipped" instead of expected "UpstreamFailed". Since the root tasks end up in Skipped status and not in UpstreamFailed, the DAG is marked as Success instead of the expected Failed status. Please see the attachment for reference. The "step 8" is the task with trigger rule "one_success". -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2432) Templated fields containing password / tokens is displaying in plain text on UI
John Cheng created AIRFLOW-2432: --- Summary: Templated fields containing password / tokens is displaying in plain text on UI Key: AIRFLOW-2432 URL: https://issues.apache.org/jira/browse/AIRFLOW-2432 Project: Apache Airflow Issue Type: Bug Reporter: John Cheng Attachments: templated field.PNG I am trying to pass a password to a bash operator with env. However env is a templated filed, it will display my password in plaint text on the UI. !templated field.PNG! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2414) Fix RBAC log display
[ https://issues.apache.org/jira/browse/AIRFLOW-2414?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466714#comment-16466714 ] Joy Gao commented on AIRFLOW-2414: -- Hmm interesting: {code:java} if ti is None: logs = ["*** Task instance did not exist in the DB\n"] else: logger = logging.getLogger('airflow.task') task_log_reader = conf.get('core', 'task_log_reader') handler = next((handler for handler in logger.handlers if handler.name == task_log_reader), None) try: ti.task = dag.get_task(ti.task_id) logs = handler.read(ti) except AttributeError as e: logs = ["Task log handler {} does not support read logs.\n{}\n".format(task_log_reader, str(e))] for i, log in enumerate(logs): if PY2 and not isinstance(log, unicode): logs[i] = log.decode('utf-8') {code} ``` Log should be string, wondering if this bug is related to subdags? can you print out the list object and see what it contains? > Fix RBAC log display > > > Key: AIRFLOW-2414 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2414 > Project: Apache Airflow > Issue Type: Bug > Components: webserver >Affects Versions: 1.10.0 >Reporter: Oleg Yamin >Assignee: Oleg Yamin >Priority: Major > Fix For: 1.10.0 > > > Getting the following error when trying to view the log file in new RBAC UI. > {code:java} > [2018-05-02 17:49:47,716] ERROR in app: Exception on /log [GET] > Traceback (most recent call last): > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1982, in > wsgi_app > response = self.full_dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1614, in > full_dispatch_request > rv = self.handle_user_exception(e) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1517, in > handle_user_exception > reraise(exc_type, exc_value, tb) > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1612, in > full_dispatch_request > rv = self.dispatch_request() > File "/usr/lib64/python2.7/site-packages/flask/app.py", line 1598, in > dispatch_request > return self.view_functions[rule.endpoint](**req.view_args) > File > "/usr/lib/python2.7/site-packages/flask_appbuilder/security/decorators.py", > line 26, in wraps > return f(self, *args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/decorators.py", line > 55, in wrapper > return f(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/utils/db.py", line 74, in > wrapper > return func(*args, **kwargs) > File "/usr/lib/python2.7/site-packages/airflow/www_rbac/views.py", line 456, > in log > logs = log.decode('utf-8') > AttributeError: 'list' object has no attribute 'decode'{code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2431) Add the navigation bar color parameter for RBAC UI
Joy Gao created AIRFLOW-2431: Summary: Add the navigation bar color parameter for RBAC UI Key: AIRFLOW-2431 URL: https://issues.apache.org/jira/browse/AIRFLOW-2431 Project: Apache Airflow Issue Type: New Feature Reporter: Licht Takeuchi Assignee: Licht Takeuchi Fix For: 2.0.0 We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot distinguish which Airflow is. This feature enables us to discern the Airflow by the color of navigation bar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2229) Scheduler cannot retry abrupt task failures within factory-generated DAGs
[ https://issues.apache.org/jira/browse/AIRFLOW-2229?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466645#comment-16466645 ] sunil kumar commented on AIRFLOW-2229: -- I'm encountering task failing with similar error message. Could it be, executor is unable to get right status for the task from broker? [2018-05-05 13:41:43,111] \{jobs.py:1425} ERROR - Executor reports task instance %s finished (%s) although the task says its %s. Was the task killed externally? [2018-05-05 13:41:46,364] \{jobs.py:1435} ERROR - Cannot load the dag bag to handle failure for Scheduler cannot retry abrupt task failures within factory-generated DAGs > - > > Key: AIRFLOW-2229 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2229 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.9.0 >Reporter: James Meickle >Priority: Major > > We had an issue where one of our tasks failed without the worker updating > state (unclear why, but let's assume it was an OOM), resulting in this series > of error messages: > {{Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com > airflow_scheduler-stdout.log: [2018-03-20 14:27:04,993] \{{models.py:1595 > ERROR - Executor reports task instance %s finished (%s) although the task > says its %s. Was the task killed externally? > {{ Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com > airflow_scheduler-stdout.log: NoneType}} > {{ Mar 20 14:27:05 airflow-core-i-0fc1f995414837b8b.stg.int.dynoquant.com > airflow_scheduler-stdout.log: [2018-03-20 14:27:04,994] {{jobs.py:1435 ERROR > - Cannot load the dag bag to handle failure for nightly_dataload.dummy_operator 2018-03-19 00:00:00 [queued]>. Setting task > to FAILED without callbacks or retries. Do you have enough resources? > Mysterious failures are not unexpected, because we are in the cloud, after > all. The concern is the last line: ignoring callbacks and retries, implying > that it's a lack of resources. However, the machine was totally underutilized > at the time. > I dug into this code a bit more and as far as I can tell this error is > happening in this code path: > [https://github.com/apache/incubator-airflow/blob/1.9.0/airflow/jobs.py#L1427] > {{self.log.error(msg)}} > {{try:}} > {{ simple_dag = simple_dag_bag.get_dag(dag_id)}} > {{ dagbag = models.DagBag(simple_dag.full_filepath)}} > {{ dag = dagbag.get_dag(dag_id)}} > {{ ti.task = dag.get_task(task_id)}} > {{ ti.handle_failure(msg)}} > {{except Exception:}} > {{ self.log.error("Cannot load the dag bag to handle failure for %s"}} > {{ ". Setting task to FAILED without callbacks or "}} > {{ "retries. Do you have enough resources?", ti)}} > {{ ti.state = State.FAILED}} > {{ session.merge(ti)}} > {{ session.commit()}}{{}} > I am not very familiar with this code, nor do I have time to attach a > debugger at the moment, but I think what is happening here is: > * I have a factory Python file, which imports and instantiates DAG code from > other files. > * The scheduler loads the DAGs from the factory file on the filesystem. It > gets a fileloc (as represented in the DB) not of the factory file, but of the > file it loaded code from. > * The scheduler makes a simple DAGBag from the instantiated DAGs. > * This line of code uses the simple DAG, which references the original DAG > object's fileloc, to create a new DAGBag object. > * This DAGBag looks for the original DAG in the fileloc, which is the file > containing that DAG's _code_, but is not actually importable by Airflow. > * An exception is raised trying to load the DAG from the DAGBag, which found > nothing. > * Handling of the task failure never occurs. > * The over-broad Exception code swallows all of the above occurring. > * There's just a generic error message that is not helpful to a system > operator. > If this is the case, at minimum, the try/except should be rewritten to be > more graceful and to have a better error message. But I question whether this > level of DAGBag abstraction/indirection isn't making this failure case worse > than it needs to be; under normal conditions the scheduler is definitely able > to find the relevant factory-generated DAGs and execute tasks within them as > expected, even with the fileloc set to the code path and not the import path. > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Comment Edited] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466472#comment-16466472 ] cclauss edited comment on AIRFLOW-2407 at 5/7/18 9:04 PM: -- Three undefined names remain: when doing flake8 testing of [https://github.com/apache/incubator-airflow] on Python 3.6.3 The command "echo ; echo -n "flake8 testing of ${URL} on " ; python -V" exited with 0. $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ ./airflow/www/app.py:148:17: F821 undefined name 'reload' reload(e) ^ ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' __enter__=cursor_mock, ^ ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' __enter__=get_conn_mock, ^ 3 F821 undefined name 'reload' 3 was (Author: cclauss): Three undefined names remain: when doing flake8 testing of [https://github.com/apache/incubator-airflow] on Python 3.6.3 The command "echo ; echo -n "flake8 testing of ${URL} on " ; python -V" exited with 0. $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ {{./airflow/www/app.py:148:17: F821 undefined name 'reload' reload(e) ^ ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' __enter__=cursor_mock, ^ ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' __enter__=get_conn_mock, ^ 3 F821 undefined name 'reload' 3}} > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466472#comment-16466472 ] cclauss commented on AIRFLOW-2407: -- Three undefined names remain: when doing flake8 testing of [https://github.com/apache/incubator-airflow] on Python 3.6.3 The command "echo ; echo -n "flake8 testing of ${URL} on " ; python -V" exited with 0. $ __flake8 . --count --select=E901,E999,F821,F822,F823 --show-source --statistics__ {{./airflow/www/app.py:148:17: F821 undefined name 'reload' reload(e) ^ ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' __enter__=cursor_mock, ^ ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' __enter__=get_conn_mock, ^ 3 F821 undefined name 'reload' 3}} > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2430) Bad query patterns at scale prevent scheduler from starting
Gabriel Silk created AIRFLOW-2430: - Summary: Bad query patterns at scale prevent scheduler from starting Key: AIRFLOW-2430 URL: https://issues.apache.org/jira/browse/AIRFLOW-2430 Project: Apache Airflow Issue Type: Bug Components: scheduler Reporter: Gabriel Silk h2. Summary Certain queries executed by the scheduler do not scale well with the number of tasks being operated on. Two example functions * reset_state_for_orphaned_tasks * _execute_task_instances Concretely — with a mere 75k tasks being operated on, the first query can take dozens of minutes to run, blocking the scheduler from making progress. The cause is twofold: 1. As the query grows past a certain point, the MySQL planner will choose to do a full table scan as opposed to using an index. I assume the same is true of Postgres. 2. The query predicate size grows linearly in the number of tasks being operated, thus increasing the amount of work that needs to be done per row. In a sense, you’re left with an operation that scales O(n^2) h2. Proposed Fix It appears that one of these bad query patterns was fixed in [3547cbffd|https://github.com/apache/incubator-airflow/commit/3547cbffdbffac2f98a8aa05526e8c9671221025] by introducing a configurable batch size with can be set via max_tis_per_query. I propose we extend the suggested fix to include other poorly-performing queries in the scheduler. I’ve identified two queries that are directly affecting my work and included them in the diff, though the same approach can be extended to more queries as we see fit. Thanks! -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-2429) Make Airflow Flake8 compliant
Fokko Driesprong created AIRFLOW-2429: - Summary: Make Airflow Flake8 compliant Key: AIRFLOW-2429 URL: https://issues.apache.org/jira/browse/AIRFLOW-2429 Project: Apache Airflow Issue Type: Improvement Reporter: Fokko Driesprong -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466404#comment-16466404 ] ASF subversion and git services commented on AIRFLOW-2407: -- Commit b9eb52cc012637e69c0423441d645f1caa561f9b in incubator-airflow's branch refs/heads/master from [~cclauss] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b9eb52c ] [AIRFLOW-2407] Resolve Python undefined names Closes #3307 from cclauss/AIRFLOW-2407 > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2407. --- Resolution: Fixed Fix Version/s: 2.0.0 Issue resolved by pull request #3307 [https://github.com/apache/incubator-airflow/pull/3307] > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2407) Undefined names in Python code
[ https://issues.apache.org/jira/browse/AIRFLOW-2407?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466403#comment-16466403 ] ASF subversion and git services commented on AIRFLOW-2407: -- Commit b9eb52cc012637e69c0423441d645f1caa561f9b in incubator-airflow's branch refs/heads/master from [~cclauss] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=b9eb52c ] [AIRFLOW-2407] Resolve Python undefined names Closes #3307 from cclauss/AIRFLOW-2407 > Undefined names in Python code > -- > > Key: AIRFLOW-2407 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2407 > Project: Apache Airflow > Issue Type: Bug >Reporter: cclauss >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 336h > Remaining Estimate: 336h > > flake8 testing of https://github.com/apache/incubator-airflow on Python 3.6.3 > $ *flake8 . --count --select=E901,E999,F821,F822,F823 --show-source > --statistics* > {noformat} > ./airflow/contrib/auth/backends/kerberos_auth.py:67:13: F821 undefined name > 'logging' > logging.error('Password validation for principal %s failed %s', > user_principal, e) > ^ > ./airflow/contrib/hooks/aws_hook.py:75:13: F821 undefined name 'logging' > logging.warning("Option Error in parsing s3 config file") > ^ > ./airflow/contrib/operators/datastore_export_operator.py:105:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/operators/datastore_import_operator.py:94:19: F821 > undefined name 'AirflowException' > raise AirflowException('Operation failed: > result={}'.format(result)) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:62:9: F821 undefined name 'this' > this.log.info('Poking: %s', self.data) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:68:13: F821 undefined name > 'logging' > logging.exception(e) > ^ > ./airflow/contrib/sensors/qubole_sensor.py:71:9: F821 undefined name 'this' > this.log.info('Status of this Poke: %s', status) > ^ > ./airflow/www/app.py:148:17: F821 undefined name 'reload' > reload(e) > ^ > ./tests/operators/hive_operator.py:178:27: F821 undefined name 'cursor_mock' > __enter__=cursor_mock, > ^ > ./tests/operators/hive_operator.py:184:27: F821 undefined name 'get_conn_mock' > __enter__=get_conn_mock, > ^ > ./tests/operators/test_virtualenv_operator.py:166:19: F821 undefined name > 'virtualenv_string_args' > print(virtualenv_string_args) > ^ > ./tests/operators/test_virtualenv_operator.py:167:16: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: >^ > ./tests/operators/test_virtualenv_operator.py:167:45: F821 undefined name > 'virtualenv_string_args' > if virtualenv_string_args[0] != virtualenv_string_args[2]: > ^ > 13F821 undefined name 'logging' > 13 > {noformat} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2407] Resolve Python undefined names
Repository: incubator-airflow Updated Branches: refs/heads/master c27d8fd0b -> b9eb52cc0 [AIRFLOW-2407] Resolve Python undefined names Closes #3307 from cclauss/AIRFLOW-2407 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/b9eb52cc Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/b9eb52cc Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/b9eb52cc Branch: refs/heads/master Commit: b9eb52cc012637e69c0423441d645f1caa561f9b Parents: c27d8fd Author: cclaussAuthored: Mon May 7 22:02:04 2018 +0200 Committer: Fokko Driesprong Committed: Mon May 7 22:02:04 2018 +0200 -- airflow/contrib/auth/backends/kerberos_auth.py | 3 ++- airflow/contrib/hooks/aws_hook.py | 1 + airflow/contrib/operators/datastore_export_operator.py | 1 + airflow/contrib/operators/datastore_import_operator.py | 3 +-- airflow/contrib/sensors/qubole_sensor.py | 3 ++- tests/operators/test_virtualenv_operator.py| 2 +- 6 files changed, 8 insertions(+), 5 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b9eb52cc/airflow/contrib/auth/backends/kerberos_auth.py -- diff --git a/airflow/contrib/auth/backends/kerberos_auth.py b/airflow/contrib/auth/backends/kerberos_auth.py index f6f9d69..0dc8bd4 100644 --- a/airflow/contrib/auth/backends/kerberos_auth.py +++ b/airflow/contrib/auth/backends/kerberos_auth.py @@ -17,8 +17,9 @@ # specific language governing permissions and limitations # under the License. +import logging import flask_login -from flask_login import login_required, current_user, logout_user +from flask_login import current_user from flask import flash from wtforms import ( Form, PasswordField, StringField) http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b9eb52cc/airflow/contrib/hooks/aws_hook.py -- diff --git a/airflow/contrib/hooks/aws_hook.py b/airflow/contrib/hooks/aws_hook.py index 29bf740..c8ded4d 100644 --- a/airflow/contrib/hooks/aws_hook.py +++ b/airflow/contrib/hooks/aws_hook.py @@ -20,6 +20,7 @@ import boto3 import configparser +import logging from airflow.exceptions import AirflowException from airflow.hooks.base_hook import BaseHook http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b9eb52cc/airflow/contrib/operators/datastore_export_operator.py -- diff --git a/airflow/contrib/operators/datastore_export_operator.py b/airflow/contrib/operators/datastore_export_operator.py index 15b19ec..09b7965 100644 --- a/airflow/contrib/operators/datastore_export_operator.py +++ b/airflow/contrib/operators/datastore_export_operator.py @@ -19,6 +19,7 @@ # from airflow.contrib.hooks.datastore_hook import DatastoreHook from airflow.contrib.hooks.gcs_hook import GoogleCloudStorageHook +from airflow.exceptions import AirflowException from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b9eb52cc/airflow/contrib/operators/datastore_import_operator.py -- diff --git a/airflow/contrib/operators/datastore_import_operator.py b/airflow/contrib/operators/datastore_import_operator.py index 2c9c75e..279b1a5 100644 --- a/airflow/contrib/operators/datastore_import_operator.py +++ b/airflow/contrib/operators/datastore_import_operator.py @@ -18,6 +18,7 @@ # under the License. # from airflow.contrib.hooks.datastore_hook import DatastoreHook +from airflow.exceptions import AirflowException from airflow.models import BaseOperator from airflow.utils.decorators import apply_defaults @@ -88,11 +89,9 @@ class DatastoreImportOperator(BaseOperator): result = ds_hook.poll_operation_until_done(operation_name, self.polling_interval_in_seconds) - state = result['metadata']['common']['state'] if state != 'SUCCESSFUL': raise AirflowException('Operation failed: result={}'.format(result)) if self.xcom_push: return result - http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/b9eb52cc/airflow/contrib/sensors/qubole_sensor.py -- diff --git a/airflow/contrib/sensors/qubole_sensor.py b/airflow/contrib/sensors/qubole_sensor.py index 2860f92..4f00e55 100644 --- a/airflow/contrib/sensors/qubole_sensor.py +++
[jira] [Created] (AIRFLOW-2428) Add AutoScalingRole key to emr_hook
Kyle Hamlin created AIRFLOW-2428: Summary: Add AutoScalingRole key to emr_hook Key: AIRFLOW-2428 URL: https://issues.apache.org/jira/browse/AIRFLOW-2428 Project: Apache Airflow Issue Type: Improvement Components: hooks Reporter: Kyle Hamlin Fix For: 1.10.0 Need to be able to pass the `AutoScalingRole` param to the `run_job_flow` method for EMR autoscaling to work. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-2272) Travis CI is failing builds due to oracle-java8-installer failing to install via apt-get
[ https://issues.apache.org/jira/browse/AIRFLOW-2272?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-2272. - Resolution: Fixed Fixed in: https://github.com/apache/incubator-airflow/commit/0f8507ae351787e086d1d1038f6f0ba52e6d9aaa > Travis CI is failing builds due to oracle-java8-installer failing to install > via apt-get > > > Key: AIRFLOW-2272 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2272 > Project: Apache Airflow > Issue Type: Bug > Components: ci >Reporter: David Klosowski >Priority: Critical > Labels: build-failure > > All the PR builds in TravisCI are failing with the following apt-get error: > > {code:java} > --2018-03-30 17:56:23-- (try: 5) > http://download.oracle.com/otn-pub/java/jdk/8u161-b12/2f38c3b165be4555a1fa6e98c45e0808/jdk-8u161-linux-x64.tar.gz > Connecting to download.oracle.com (download.oracle.com)|23.45.144.164|:80... > failed: Connection timed out. > Giving up. > apt-get install failed > $ cat ~/apt-get-update.log > Ign:1 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty InRelease > Hit:2 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-updates > InRelease > Hit:3 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty-backports > InRelease > Hit:4 http://us-central1.gce.archive.ubuntu.com/ubuntu trusty Release > Ign:5 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 InRelease > Ign:6 http://dl.google.com/linux/chrome/deb stable InRelease > Hit:7 http://repo.mongodb.org/apt/ubuntu trusty/mongodb-org/3.4 Release > Hit:8 http://security.ubuntu.com/ubuntu trusty-security InRelease > Ign:9 http://toolbelt.heroku.com/ubuntu ./ InRelease > Get:10 http://dl.bintray.com/apache/cassandra 39x InRelease [3,168 B] > Hit:11 http://ppa.launchpad.net/chris-lea/redis-server/ubuntu trusty InRelease > Hit:13 https://download.docker.com/linux/ubuntu trusty InRelease > Hit:12 http://toolbelt.heroku.com/ubuntu ./ Release > Hit:15 http://dl.google.com/linux/chrome/deb stable Release > Hit:16 http://apt.postgresql.org/pub/repos/apt trusty-pgdg InRelease > Hit:18 https://dl.hhvm.com/ubuntu trusty InRelease > Ign:19 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty InRelease > Hit:22 https://packagecloud.io/computology/apt-backport/ubuntu trusty > InRelease > Hit:23 https://packagecloud.io/github/git-lfs/ubuntu trusty InRelease > Hit:24 https://packagecloud.io/rabbitmq/rabbitmq-server/ubuntu trusty > InRelease > Hit:25 http://ppa.launchpad.net/git-core/ppa/ubuntu trusty InRelease > Hit:26 http://ppa.launchpad.net/openjdk-r/ppa/ubuntu trusty InRelease > Hit:27 http://ppa.launchpad.net/pollinate/ppa/ubuntu trusty InRelease > Hit:28 http://ppa.launchpad.net/webupd8team/java/ubuntu trusty InRelease > Hit:29 http://ppa.launchpad.net/couchdb/stable/ubuntu trusty Release > Fetched 3,168 B in 2s (1,102 B/s) > Reading package lists... > W: http://ppa.launchpad.net/couchdb/stable/ubuntu/dists/trusty/Release.gpg: > Signature by key 15866BAFD9BCC4F3C1E0DFC7D69548E1C17EAB57 uses weak digest > algorithm (SHA1) > > The command "sudo -E apt-get -yq --no-install-suggests > --no-install-recommends --force-yes install slapd ldap-utils openssh-server > mysql-server-5.6 mysql-client-core-5.6 mysql-client-5.6 krb5-user krb5-kdc > krb5-admin-server oracle-java8-installer python-selinux" failed and exited > with 100 during .{code} > > It looks like this is due to the configuration in the .travis.yml installing > {{oracle-java8-installer}}: > {code} > apt: > packages: > - slapd > - ldap-utils > - openssh-server > - mysql-server-5.6 > - mysql-client-core-5.6 > - mysql-client-5.6 > - krb5-user > - krb5-kdc > - krb5-admin-server > - oracle-java8-installer > - python-selinux > {code} > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[3/4] incubator-airflow git commit: closes apache/incubator-airflow#2770 *Closed for inactivity.*
closes apache/incubator-airflow#2770 *Closed for inactivity.* Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/f9eda3d4 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/f9eda3d4 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/f9eda3d4 Branch: refs/heads/master Commit: f9eda3d49236e8a4ea910f43c23e0955fe6cf727 Parents: 1610d84 Author: r39132Authored: Mon May 7 11:44:45 2018 -0700 Committer: r39132 Committed: Mon May 7 11:44:45 2018 -0700 -- --
[4/4] incubator-airflow git commit: closes apache/incubator-airflow#2768 *Closed for inactivity.*
closes apache/incubator-airflow#2768 *Closed for inactivity.* Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/c27d8fd0 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/c27d8fd0 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/c27d8fd0 Branch: refs/heads/master Commit: c27d8fd0b03c9fb1ccb01132c3078d18cf25695c Parents: f9eda3d Author: r39132Authored: Mon May 7 11:44:48 2018 -0700 Committer: r39132 Committed: Mon May 7 11:44:48 2018 -0700 -- --
[2/4] incubator-airflow git commit: closes apache/incubator-airflow#2769 *Closed for inactivity.*
closes apache/incubator-airflow#2769 *Closed for inactivity.* Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/1610d844 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/1610d844 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/1610d844 Branch: refs/heads/master Commit: 1610d8443eadadac9d78416c73173c832c31ba8e Parents: 788625a Author: r39132Authored: Mon May 7 11:44:42 2018 -0700 Committer: r39132 Committed: Mon May 7 11:44:42 2018 -0700 -- --
[jira] [Commented] (AIRFLOW-1952) Add the navigation bar color parameter
[ https://issues.apache.org/jira/browse/AIRFLOW-1952?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466299#comment-16466299 ] ASF subversion and git services commented on AIRFLOW-1952: -- Commit 21458399609a879b318ddbb37b52958c62baadc2 in incubator-airflow's branch refs/heads/master from [~Licht-T] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=2145839 ] [AIRFLOW-1952] Add the navigation bar color parameter ENH: Add the navigation bar color parameter When operating multiple Airflow's (eg. Production, Staging, etc.), we cannot distinguish which Airflow is. This feature enables us to distinguish which Airflow you see by the color of navigation bar. Merge branch 'master' into add-navigation-bar- color-parameter Closes #2903 from Licht-T/add-navigation-bar- color-parameter > Add the navigation bar color parameter > -- > > Key: AIRFLOW-1952 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1952 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Licht Takeuchi >Assignee: Licht Takeuchi >Priority: Major > Fix For: 2.0.0 > > > We operate multiple Airflow's (eg. Production, Staging, etc.), so we cannot > distinguish which Airflow is. This feature enables us to discern the Airflow > by the color of navigation bar. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-1952] Add the navigation bar color parameter
Repository: incubator-airflow Updated Branches: refs/heads/master fcf8435f4 -> 214583996 [AIRFLOW-1952] Add the navigation bar color parameter ENH: Add the navigation bar color parameter When operating multiple Airflow's (eg. Production, Staging, etc.), we cannot distinguish which Airflow is. This feature enables us to distinguish which Airflow you see by the color of navigation bar. Merge branch 'master' into add-navigation-bar- color-parameter Closes #2903 from Licht-T/add-navigation-bar- color-parameter Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/21458399 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/21458399 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/21458399 Branch: refs/heads/master Commit: 21458399609a879b318ddbb37b52958c62baadc2 Parents: fcf8435 Author: Licht-TAuthored: Mon May 7 11:18:32 2018 -0700 Committer: r39132 Committed: Mon May 7 11:18:32 2018 -0700 -- airflow/config_templates/default_airflow.cfg | 3 +++ airflow/www/app.py | 1 + airflow/www/templates/admin/master.html | 2 +- 3 files changed, 5 insertions(+), 1 deletion(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/21458399/airflow/config_templates/default_airflow.cfg -- diff --git a/airflow/config_templates/default_airflow.cfg b/airflow/config_templates/default_airflow.cfg index 6da4287..b91961e 100644 --- a/airflow/config_templates/default_airflow.cfg +++ b/airflow/config_templates/default_airflow.cfg @@ -277,6 +277,9 @@ page_size = 100 # Use FAB-based webserver with RBAC feature rbac = False +# Define the color of navigation bar +navbar_color = #007A87 + [email] email_backend = airflow.utils.email.send_email_smtp http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/21458399/airflow/www/app.py -- diff --git a/airflow/www/app.py b/airflow/www/app.py index ca0589d..e9b101d 100644 --- a/airflow/www/app.py +++ b/airflow/www/app.py @@ -156,6 +156,7 @@ def create_app(config=None, testing=False): def jinja_globals(): return { 'hostname': get_hostname(), +'navbar_color': configuration.get('webserver', 'NAVBAR_COLOR'), } @app.teardown_appcontext http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/21458399/airflow/www/templates/admin/master.html -- diff --git a/airflow/www/templates/admin/master.html b/airflow/www/templates/admin/master.html index 1f0020e..cb1e9f1 100644 --- a/airflow/www/templates/admin/master.html +++ b/airflow/www/templates/admin/master.html @@ -51,7 +51,7 @@ {% block page_body %} - +
incubator-airflow git commit: closes apache/incubator-airflow#3237 *Closed for inactivity.*
Repository: incubator-airflow Updated Branches: refs/heads/master 69da86658 -> fcf8435f4 closes apache/incubator-airflow#3237 *Closed for inactivity.* Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/fcf8435f Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/fcf8435f Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/fcf8435f Branch: refs/heads/master Commit: fcf8435f4baddf5a1ef95d3f894437f37a9309a7 Parents: 69da866 Author: r39132Authored: Mon May 7 11:14:41 2018 -0700 Committer: r39132 Committed: Mon May 7 11:14:41 2018 -0700 -- --
[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466218#comment-16466218 ] ASF subversion and git services commented on AIRFLOW-: -- Commit 60dce37725358d1320f2856840423810cca09725 in incubator-airflow's branch refs/heads/v1-10-test from [~b11c] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=60dce37 ] [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- (cherry picked from commit 69da8665862ab3684dda0bb6a2f7c30789690704) Signed-off-by: Fokko Driesprong> GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466219#comment-16466219 ] ASF subversion and git services commented on AIRFLOW-: -- Commit 60dce37725358d1320f2856840423810cca09725 in incubator-airflow's branch refs/heads/v1-10-test from [~b11c] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=60dce37 ] [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- (cherry picked from commit 69da8665862ab3684dda0bb6a2f7c30789690704) Signed-off-by: Fokko Driesprong> GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-. --- Resolution: Fixed Fix Version/s: 1.10.0 Issue resolved by pull request #3264 [https://github.com/apache/incubator-airflow/pull/3264] > GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 1.10.0, 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466216#comment-16466216 ] ASF subversion and git services commented on AIRFLOW-: -- Commit 69da8665862ab3684dda0bb6a2f7c30789690704 in incubator-airflow's branch refs/heads/master from [~b11c] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=69da866 ] [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- > GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2222] Implement GoogleCloudStorageHook.rewrite
Repository: incubator-airflow Updated Branches: refs/heads/v1-10-test ff3cab6d4 -> 60dce3772 [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- (cherry picked from commit 69da8665862ab3684dda0bb6a2f7c30789690704) Signed-off-by: Fokko DriesprongProject: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/60dce377 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/60dce377 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/60dce377 Branch: refs/heads/v1-10-test Commit: 60dce37725358d1320f2856840423810cca09725 Parents: ff3cab6 Author: Berislav Lopac Authored: Mon May 7 19:23:43 2018 +0200 Committer: Fokko Driesprong Committed: Mon May 7 19:24:00 2018 +0200 -- airflow/contrib/hooks/gcs_hook.py | 51 ++ 1 file changed, 51 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/60dce377/airflow/contrib/hooks/gcs_hook.py -- diff --git a/airflow/contrib/hooks/gcs_hook.py b/airflow/contrib/hooks/gcs_hook.py index 894cc7a..0d11c12 100644 --- a/airflow/contrib/hooks/gcs_hook.py +++ b/airflow/contrib/hooks/gcs_hook.py @@ -90,6 +90,57 @@ class GoogleCloudStorageHook(GoogleCloudBaseHook): return False raise +def rewrite(self, source_bucket, source_object, destination_bucket, +destination_object=None): +""" +Has the same functionality as copy, except that will work on files +over 5 TB, as well as when copying between locations and/or storage +classes. + +destination_object can be omitted, in which case source_object is used. + +:param source_bucket: The bucket of the object to copy from. +:type source_bucket: string +:param source_object: The object to copy. +:type source_object: string +:param destination_bucket: The destination of the object to copied to. +:type destination_bucket: string +:param destination_object: The (renamed) path of the object if given. +Can be omitted; then the same name is used. +""" +destination_object = destination_object or source_object +if (source_bucket == destination_bucket and +source_object == destination_object): +raise ValueError( +'Either source/destination bucket or source/destination object ' +'must be different, not both the same: bucket=%s, object=%s' % +(source_bucket, source_object)) +if not source_bucket or not source_object: +raise ValueError('source_bucket and source_object cannot be empty.') + +service = self.get_conn() +request_count = 1 +try: +result = service.objects() \ +.rewrite(sourceBucket=source_bucket, sourceObject=source_object, + destinationBucket=destination_bucket, + destinationObject=destination_object, body='') \ +.execute() +self.log.info('Rewrite request #%s: %s', request_count, result) +while not result['done']: +request_count += 1 +result = service.objects() \ +.rewrite(sourceBucket=source_bucket, sourceObject=source_object, + destinationBucket=destination_bucket, + destinationObject=destination_object, + rewriteToken=result['rewriteToken'], body='') \ +.execute() +self.log.info('Rewrite request #%s: %s', request_count, result) +return True +except errors.HttpError as ex: +if ex.resp['status'] == '404': +return False +raise # pylint:disable=redefined-builtin def download(self, bucket, object, filename=None):
[jira] [Commented] (AIRFLOW-2222) GoogleCloudStorageHook.copy fails for large files between locations
[ https://issues.apache.org/jira/browse/AIRFLOW-?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466217#comment-16466217 ] ASF subversion and git services commented on AIRFLOW-: -- Commit 69da8665862ab3684dda0bb6a2f7c30789690704 in incubator-airflow's branch refs/heads/master from [~b11c] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=69da866 ] [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- > GoogleCloudStorageHook.copy fails for large files between locations > --- > > Key: AIRFLOW- > URL: https://issues.apache.org/jira/browse/AIRFLOW- > Project: Apache Airflow > Issue Type: Bug >Reporter: Berislav Lopac >Assignee: Berislav Lopac >Priority: Major > Fix For: 2.0.0 > > > When copying large files (confirmed for around 3GB) between buckets in > different projects, the operation fails and the Google API returns error > [413—Payload Too > Large|https://cloud.google.com/storage/docs/json_api/v1/status-codes#413_Payload_Too_Large]. > The documentation for the error says: > {quote}The Cloud Storage JSON API supports up to 5 TB objects. > This error may, alternatively, arise if copying objects between locations > and/or storage classes can not complete within 30 seconds. In this case, use > the > [Rewrite|https://cloud.google.com/storage/docs/json_api/v1/objects/rewrite] > method instead.{quote} > The reason seems to be that the {{GoogleCloudStorageHook.copy}} is using the > API {{copy}} method. > h3. Proposed Solution > There are two potential solutions: > # Implement {{GoogleCloudStorageHook.rewrite}} method which can be called > from operators and other objects to ensure successful execution. This method > is more flexible but requires changes both in the {{GoogleCloudStorageHook}} > class and any other classes that use it for copying files to ensure that they > explicitly call {{rewrite}} when needed. > # Modify {{GoogleCloudStorageHook.copy}} to determine when to use {{rewrite}} > instead of {{copy}} underneath. This requires updating only the > {{GoogleCloudStorageHook}} class, but the logic might not cover all the edge > cases and could be difficult to implement. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2222] Implement GoogleCloudStorageHook.rewrite
Repository: incubator-airflow Updated Branches: refs/heads/master 868d392c5 -> 69da86658 [AIRFLOW-] Implement GoogleCloudStorageHook.rewrite Closes #3264 from berislavlopac/AIRFLOW- Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/69da8665 Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/69da8665 Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/69da8665 Branch: refs/heads/master Commit: 69da8665862ab3684dda0bb6a2f7c30789690704 Parents: 868d392 Author: Berislav LopacAuthored: Mon May 7 19:23:43 2018 +0200 Committer: Fokko Driesprong Committed: Mon May 7 19:23:43 2018 +0200 -- airflow/contrib/hooks/gcs_hook.py | 51 ++ 1 file changed, 51 insertions(+) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/69da8665/airflow/contrib/hooks/gcs_hook.py -- diff --git a/airflow/contrib/hooks/gcs_hook.py b/airflow/contrib/hooks/gcs_hook.py index 894cc7a..0d11c12 100644 --- a/airflow/contrib/hooks/gcs_hook.py +++ b/airflow/contrib/hooks/gcs_hook.py @@ -90,6 +90,57 @@ class GoogleCloudStorageHook(GoogleCloudBaseHook): return False raise +def rewrite(self, source_bucket, source_object, destination_bucket, +destination_object=None): +""" +Has the same functionality as copy, except that will work on files +over 5 TB, as well as when copying between locations and/or storage +classes. + +destination_object can be omitted, in which case source_object is used. + +:param source_bucket: The bucket of the object to copy from. +:type source_bucket: string +:param source_object: The object to copy. +:type source_object: string +:param destination_bucket: The destination of the object to copied to. +:type destination_bucket: string +:param destination_object: The (renamed) path of the object if given. +Can be omitted; then the same name is used. +""" +destination_object = destination_object or source_object +if (source_bucket == destination_bucket and +source_object == destination_object): +raise ValueError( +'Either source/destination bucket or source/destination object ' +'must be different, not both the same: bucket=%s, object=%s' % +(source_bucket, source_object)) +if not source_bucket or not source_object: +raise ValueError('source_bucket and source_object cannot be empty.') + +service = self.get_conn() +request_count = 1 +try: +result = service.objects() \ +.rewrite(sourceBucket=source_bucket, sourceObject=source_object, + destinationBucket=destination_bucket, + destinationObject=destination_object, body='') \ +.execute() +self.log.info('Rewrite request #%s: %s', request_count, result) +while not result['done']: +request_count += 1 +result = service.objects() \ +.rewrite(sourceBucket=source_bucket, sourceObject=source_object, + destinationBucket=destination_bucket, + destinationObject=destination_object, + rewriteToken=result['rewriteToken'], body='') \ +.execute() +self.log.info('Rewrite request #%s: %s', request_count, result) +return True +except errors.HttpError as ex: +if ex.resp['status'] == '404': +return False +raise # pylint:disable=redefined-builtin def download(self, bucket, object, filename=None):
[jira] [Resolved] (AIRFLOW-2426) Add Google Cloud Storage Hook tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-2426. --- Resolution: Fixed Fix Version/s: 2.0.0 1.10.0 Issue resolved by pull request #3322 [https://github.com/apache/incubator-airflow/pull/3322] > Add Google Cloud Storage Hook tests > --- > > Key: AIRFLOW-2426 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2426 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp, tests >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Critical > Fix For: 1.10.0, 2.0.0 > > > The Google Cloud Storage Hook is missing tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2426) Add Google Cloud Storage Hook tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466151#comment-16466151 ] ASF subversion and git services commented on AIRFLOW-2426: -- Commit ff3cab6d42e184d159da0423244fcf11e23a230e in incubator-airflow's branch refs/heads/v1-10-test from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=ff3cab6 ] [AIRFLOW-2426] Add Google Cloud Storage Hook tests - Added mock tests for methods in `GoogleCloudStorageHook`. Closes #3322 from kaxil/AIRFLOW-2426 (cherry picked from commit 868d392c5215e97dcea9d81ec6c70e685b937fe0) Signed-off-by: Fokko Driesprong> Add Google Cloud Storage Hook tests > --- > > Key: AIRFLOW-2426 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2426 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp, tests >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Critical > > The Google Cloud Storage Hook is missing tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2426) Add Google Cloud Storage Hook tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466150#comment-16466150 ] ASF subversion and git services commented on AIRFLOW-2426: -- Commit 868d392c5215e97dcea9d81ec6c70e685b937fe0 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=868d392 ] [AIRFLOW-2426] Add Google Cloud Storage Hook tests - Added mock tests for methods in `GoogleCloudStorageHook`. Closes #3322 from kaxil/AIRFLOW-2426 > Add Google Cloud Storage Hook tests > --- > > Key: AIRFLOW-2426 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2426 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp, tests >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Critical > > The Google Cloud Storage Hook is missing tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2426) Add Google Cloud Storage Hook tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466152#comment-16466152 ] ASF subversion and git services commented on AIRFLOW-2426: -- Commit ff3cab6d42e184d159da0423244fcf11e23a230e in incubator-airflow's branch refs/heads/v1-10-test from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=ff3cab6 ] [AIRFLOW-2426] Add Google Cloud Storage Hook tests - Added mock tests for methods in `GoogleCloudStorageHook`. Closes #3322 from kaxil/AIRFLOW-2426 (cherry picked from commit 868d392c5215e97dcea9d81ec6c70e685b937fe0) Signed-off-by: Fokko Driesprong> Add Google Cloud Storage Hook tests > --- > > Key: AIRFLOW-2426 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2426 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp, tests >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Critical > > The Google Cloud Storage Hook is missing tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-2426) Add Google Cloud Storage Hook tests
[ https://issues.apache.org/jira/browse/AIRFLOW-2426?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16466149#comment-16466149 ] ASF subversion and git services commented on AIRFLOW-2426: -- Commit 868d392c5215e97dcea9d81ec6c70e685b937fe0 in incubator-airflow's branch refs/heads/master from [~kaxilnaik] [ https://git-wip-us.apache.org/repos/asf?p=incubator-airflow.git;h=868d392 ] [AIRFLOW-2426] Add Google Cloud Storage Hook tests - Added mock tests for methods in `GoogleCloudStorageHook`. Closes #3322 from kaxil/AIRFLOW-2426 > Add Google Cloud Storage Hook tests > --- > > Key: AIRFLOW-2426 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2426 > Project: Apache Airflow > Issue Type: Test > Components: contrib, gcp, tests >Reporter: Kaxil Naik >Assignee: Kaxil Naik >Priority: Critical > > The Google Cloud Storage Hook is missing tests. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
incubator-airflow git commit: [AIRFLOW-2426] Add Google Cloud Storage Hook tests
Repository: incubator-airflow Updated Branches: refs/heads/master c5ed7b1fb -> 868d392c5 [AIRFLOW-2426] Add Google Cloud Storage Hook tests - Added mock tests for methods in `GoogleCloudStorageHook`. Closes #3322 from kaxil/AIRFLOW-2426 Project: http://git-wip-us.apache.org/repos/asf/incubator-airflow/repo Commit: http://git-wip-us.apache.org/repos/asf/incubator-airflow/commit/868d392c Tree: http://git-wip-us.apache.org/repos/asf/incubator-airflow/tree/868d392c Diff: http://git-wip-us.apache.org/repos/asf/incubator-airflow/diff/868d392c Branch: refs/heads/master Commit: 868d392c5215e97dcea9d81ec6c70e685b937fe0 Parents: c5ed7b1 Author: Kaxil NaikAuthored: Mon May 7 18:45:52 2018 +0200 Committer: Fokko Driesprong Committed: Mon May 7 18:45:52 2018 +0200 -- airflow/contrib/hooks/gcs_hook.py| 12 +- tests/contrib/hooks/test_gcs_hook.py | 232 +- 2 files changed, 240 insertions(+), 4 deletions(-) -- http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/868d392c/airflow/contrib/hooks/gcs_hook.py -- diff --git a/airflow/contrib/hooks/gcs_hook.py b/airflow/contrib/hooks/gcs_hook.py index 494e05f..894cc7a 100644 --- a/airflow/contrib/hooks/gcs_hook.py +++ b/airflow/contrib/hooks/gcs_hook.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -24,6 +24,8 @@ from googleapiclient import errors from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook from airflow.exceptions import AirflowException +import re + class GoogleCloudStorageHook(GoogleCloudBaseHook): """ @@ -417,6 +419,12 @@ class GoogleCloudStorageHook(GoogleCloudBaseHook): 'Invalid value ({}) passed to storage_class. Value should be ' \ 'one of {}'.format(storage_class, storage_classes) +assert re.match('[a-zA-Z0-9]+', bucket_name[0]), \ +'Bucket names must start with a number or letter.' + +assert re.match('[a-zA-Z0-9]+', bucket_name[-1]), \ +'Bucket names must end with a number or letter.' + service = self.get_conn() bucket_resource = { 'name': bucket_name, http://git-wip-us.apache.org/repos/asf/incubator-airflow/blob/868d392c/tests/contrib/hooks/test_gcs_hook.py -- diff --git a/tests/contrib/hooks/test_gcs_hook.py b/tests/contrib/hooks/test_gcs_hook.py index 7bc343d..2af62e8 100644 --- a/tests/contrib/hooks/test_gcs_hook.py +++ b/tests/contrib/hooks/test_gcs_hook.py @@ -7,9 +7,9 @@ # to you under the Apache License, Version 2.0 (the # "License"); you may not use this file except in compliance # with the License. You may obtain a copy of the License at -# +# # http://www.apache.org/licenses/LICENSE-2.0 -# +# # Unless required by applicable law or agreed to in writing, # software distributed under the License is distributed on an # "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY @@ -20,7 +20,22 @@ import unittest import airflow.contrib.hooks.gcs_hook as gcs_hook + from airflow.exceptions import AirflowException +from apiclient.errors import HttpError + +try: +from unittest import mock +except ImportError: +try: +import mock +except ImportError: +mock = None + +BASE_STRING = 'airflow.contrib.hooks.gcp_api_base_hook.{}' +GCS_STRING = 'airflow.contrib.hooks.gcs_hook.{}' + +EMPTY_CONTENT = ''.encode('utf8') class TestGCSHookHelperFunctions(unittest.TestCase): @@ -45,3 +60,216 @@ class TestGCSHookHelperFunctions(unittest.TestCase): # bucket only self.assertEqual( gcs_hook._parse_gcs_url('gs://bucket/'), ('bucket', '')) + + +class TestGCSBucket(unittest.TestCase): +def test_bucket_name_value(self): + +bad_start_bucket_name = '/testing123' +with self.assertRaises(AssertionError): + +gcs_hook.GoogleCloudStorageHook().create_bucket( +bucket_name=bad_start_bucket_name +) + +bad_end_bucket_name = 'testing123/' +with self.assertRaises(AssertionError): +gcs_hook.GoogleCloudStorageHook().create_bucket( +bucket_name=bad_end_bucket_name +) + + +class TestGoogleCloudStorageHook(unittest.TestCase): +def setUp(self): +with
[jira] [Created] (AIRFLOW-2427) Write tests for NamedHivePartitionSensor
Giovanni Lanzani created AIRFLOW-2427: - Summary: Write tests for NamedHivePartitionSensor Key: AIRFLOW-2427 URL: https://issues.apache.org/jira/browse/AIRFLOW-2427 Project: Apache Airflow Issue Type: Test Components: core Affects Versions: Airflow 1.9.0 Reporter: Giovanni Lanzani Assignee: Giovanni Lanzani -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-584) Airflow Pool does not limit running tasks
[ https://issues.apache.org/jira/browse/AIRFLOW-584?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16465741#comment-16465741 ] Estefania Rabadan commented on AIRFLOW-584: --- Same issue in 1.9.0 > Airflow Pool does not limit running tasks > - > > Key: AIRFLOW-584 > URL: https://issues.apache.org/jira/browse/AIRFLOW-584 > Project: Apache Airflow > Issue Type: Bug > Components: pools >Affects Versions: Airflow 1.7.1.3 > Environment: Ubuntu 14.04 >Reporter: David Kegley >Priority: Major > Attachments: img1.png, img2.png > > > Airflow pools are not limiting the number of running task instances for the > following dag in 1.7.1.3 > Steps to recreate: > Create a pool of size 5 through the UI. > The following dag has 52 tasks with increasing priority corresponding to the > task number. There should only ever be 5 tasks running at a time however I > observed 29 'used slots' in a pool with 5 slots > {code} > dag_name = 'pools_bug' > default_args = { > 'owner': 'airflow', > 'depends_on_past': False, > 'start_date': datetime(2016, 10, 20), > 'email_on_failure': False, > 'retries': 1 > } > dag = DAG(dag_name, default_args=default_args, schedule_interval="0 8 * * *") > start = DummyOperator(task_id='start', dag=dag) > end = DummyOperator(task_id='end', dag=dag) > for i in range(50): > sleep_command = 'sleep 10' > task_name = 'task-{}'.format(i) > op = BashOperator( > task_id=task_name, > bash_command=sleep_command, > execution_timeout=timedelta(hours=4), > priority_weight=i, > pool=dag_name, > dag=dag) > start.set_downstream(op) > end.set_upstream(op) > {code} > Relevant configurations from airflow.cfg: > {code} > [core] > # The executor class that airflow should use. Choices include > # SequentialExecutor, LocalExecutor, CeleryExecutor > executor = CeleryExecutor > # The amount of parallelism as a setting to the executor. This defines > # the max number of task instances that should run simultaneously > # on this airflow installation > parallelism = 64 > # The number of task instances allowed to run concurrently by the scheduler > dag_concurrency = 64 > # The maximum number of active DAG runs per DAG > max_active_runs_per_dag = 1 > [celery] > # This section only applies if you are using the CeleryExecutor in > # [core] section above > # The app name that will be used by celery > celery_app_name = airflow.executors.celery_executor > # The concurrency that will be used when starting workers with the > # "airflow worker" command. This defines the number of task instances that > # a worker will take, so size up your workers based on the resources on > # your worker box and the nature of your tasks > celeryd_concurrency = 64 > [scheduler] > # Task instances listen for external kill signal (when you clear tasks > # from the CLI or the UI), this defines the frequency at which they should > # listen (in seconds). > job_heartbeat_sec = 5 > # The scheduler constantly tries to trigger new tasks (look at the > # scheduler section in the docs for more information). This defines > # how often the scheduler should run (in seconds). > scheduler_heartbeat_sec = 5 > {code} > !img1.png! > !img2.png! -- This message was sent by Atlassian JIRA (v7.6.3#76005)