[jira] [Reopened] (AIRFLOW-3562) Remove DagBag dependency inside the webserver
[ https://issues.apache.org/jira/browse/AIRFLOW-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reopened AIRFLOW-3562: --- > Remove DagBag dependency inside the webserver > - > > Key: AIRFLOW-3562 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3562 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Affects Versions: 1.10.0, 1.10.1 >Reporter: Peter van 't Hof >Assignee: Peter van 't Hof >Priority: Critical > Fix For: 2.0.0 > > > Currently the webserver does depend on the dag directory and reparsing the > dag files. Because this happens with gunicorn each process has there own > DagBag. The means the webserver is statefull and depending which process is > assigned a request the information can be changed. If all information will > come from the database this will not happen. The only process that should add > dag/dagruns is the scheduler. > The webserver should have enough on only the database. Currently not all > information is already in the database. This could be fixable, already > started this in AIRFLOW-3561. > I do notice this might give some conflicts with the brace change that is > coming. Any suggestion how to approach this? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Resolved] (AIRFLOW-3561) Improve some views
[ https://issues.apache.org/jira/browse/AIRFLOW-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter van 't Hof resolved AIRFLOW-3561. --- Resolution: Fixed Fix Version/s: 2.0.0 Merged > Improve some views > -- > > Key: AIRFLOW-3561 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3561 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Peter van 't Hof >Assignee: Peter van 't Hof >Priority: Minor > Fix For: 2.0.0 > > > Some views does interaction with the dag bag while is not needed for the > query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Assigned] (AIRFLOW-3562) Remove DagBag dependency inside the webserver
[ https://issues.apache.org/jira/browse/AIRFLOW-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reassigned AIRFLOW-3562: - Assignee: Peter van 't Hof > Remove DagBag dependency inside the webserver > - > > Key: AIRFLOW-3562 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3562 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Affects Versions: 1.10.0, 1.10.1 >Reporter: Peter van 't Hof >Assignee: Peter van 't Hof >Priority: Critical > Fix For: 2.0.0 > > > Currently the webserver does depend on the dag directory and reparsing the > dag files. Because this happens with gunicorn each process has there own > DagBag. The means the webserver is statefull and depending which process is > assigned a request the information can be changed. If all information will > come from the database this will not happen. The only process that should add > dag/dagruns is the scheduler. > The webserver should have enough on only the database. Currently not all > information is already in the database. This could be fixable, already > started this in AIRFLOW-3561. > I do notice this might give some conflicts with the brace change that is > coming. Any suggestion how to approach this? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-3562) Remove DagBag dependency inside the webserver
[ https://issues.apache.org/jira/browse/AIRFLOW-3562?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-3562. - Resolution: Fixed Fix Version/s: 2.0.0 > Remove DagBag dependency inside the webserver > - > > Key: AIRFLOW-3562 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3562 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Affects Versions: 1.10.0, 1.10.1 >Reporter: Peter van 't Hof >Assignee: Peter van 't Hof >Priority: Critical > Fix For: 2.0.0 > > > Currently the webserver does depend on the dag directory and reparsing the > dag files. Because this happens with gunicorn each process has there own > DagBag. The means the webserver is statefull and depending which process is > assigned a request the information can be changed. If all information will > come from the database this will not happen. The only process that should add > dag/dagruns is the scheduler. > The webserver should have enough on only the database. Currently not all > information is already in the database. This could be fixable, already > started this in AIRFLOW-3561. > I do notice this might give some conflicts with the brace change that is > coming. Any suggestion how to approach this? > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3568) S3ToGoogleCloudStorageOperator failed after succeeding in copying files from s3
[ https://issues.apache.org/jira/browse/AIRFLOW-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729402#comment-16729402 ] Yohei Onishi commented on AIRFLOW-3568: --- OK I will check when I have time. > S3ToGoogleCloudStorageOperator failed after succeeding in copying files from > s3 > --- > > Key: AIRFLOW-3568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3568 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I tried to copy files from s3 to gcs using > S3ToGoogleCloudStorageOperator. The file successfully was uploaded to GCS but > the task failed with the following error. > {code:java} > [2018-12-26 07:56:33,062] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,062] {discovery.py:871} INFO - > URL being requested: POST > https://www.googleapis.com/upload/storage/v1/b/stg-rfid-etl-tmp/o?name=rfid_wh%2Fuq%2Fjp%2Fno_resp_carton_1D%2F2018%2F12%2F24%2F21%2Fno_resp_carton_20181224210201.csv=json=media > [2018-12-26 07:56:33,214] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,213] {s3_to_gcs_operator.py:177} > INFO - All done, uploaded 1 files to Google Cloud Storage > [2018-12-26 07:56:33,217] {models.py:1736} ERROR - Object of type 'set' is > not JSON serializable > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1637, in _run_raw_tas > self.xcom_push(key=XCOM_RETURN_KEY, value=result > File "/usr/local/lib/airflow/airflow/models.py", line 1983, in xcom_pus > execution_date=execution_date or self.execution_date > File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrappe > return func(*args, **kwargs > File "/usr/local/lib/airflow/airflow/models.py", line 4531, in se > value = json.dumps(value).encode('UTF-8' > File "/usr/local/lib/python3.6/json/__init__.py", line 231, in dump > return _default_encoder.encode(obj > File "/usr/local/lib/python3.6/json/encoder.py", line 199, in encod > chunks = self.iterencode(o, _one_shot=True > File "/usr/local/lib/python3.6/json/encoder.py", line 257, in iterencod > return _iterencode(o, 0 > File "/usr/local/lib/python3.6/json/encoder.py", line 180, in defaul > o.__class__.__name__ > TypeError: Object of type 'set' is not JSON serializabl > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,217] {models.py:1736} ERROR - > Object of type 'set' is not JSON serializable > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 Traceback (most recent call last): > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1637, in _run_raw_task > [2018-12-26 07:56:33,220] {models.py:1756} INFO - Marking task as UP_FOR_RETRY > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 self.xcom_push(key=XCOM_RETURN_KEY, value=result) > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1983, in xcom_push > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 execution_date=execution_date or > self.execution_date) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/utils/db.py", > line 74, in wrapper > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return func(*args, **kwargs) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 4531, in set > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 value = json.dumps(value).encode('UTF-8') > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/__init__.py", > line 231, in dumps > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return _default_encoder.encode(obj) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/encoder.py", > line 199, in encode > [2018-12-26 07:56:33,222] {base_task_runner.py:107} INFO - Job 39:
[GitHub] XD-DENG opened a new pull request #4375: [AIRFLOW-XXX] Fix/complete example code in plugins.rst
XD-DENG opened a new pull request #4375: [AIRFLOW-XXX] Fix/complete example code in plugins.rst URL: https://github.com/apache/incubator-airflow/pull/4375 ### Jira Pure doc change ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: How `AppBuilderBaseView` should be imported was not included in the example code. This may confuse users (in the whole codebase, the only location in which this was mentioned is `tests/plugins/test_plugin.py`. Normal users of Airflow may never get to know it). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729399#comment-16729399 ] Yohei Onishi commented on AIRFLOW-3571: --- Thanks, will do. > GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS > to BiqQuery but a task is failed > - > > Key: AIRFLOW-3571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I am using the following service in asia-northeast1-c zone. * GCS: > asia-northeast1-c > * BigQuery dataset and table: asia-northeast1-c > * Composer: asia-northeast1-c > My task created by GoogleCloudStorageToBigQueryOperator succeeded to > uploading CSV file from a GCS bucket to a BigQuery table but the task was > failed due to the following error. > > {code:java} > [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask > bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] > {discovery.py:871} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status > check failed. Final error was: %s', 404) > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 981, in run_with_configuratio > jobId=self.running_job_id).execute( > File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", > line 130, in positional_wrappe > return wrapped(*args, **kwargs > File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line > 851, in execut > raise HttpError(resp, content, uri=self.uri > googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > returned "Not found: Job my-project:job_abc123" > During handling of the above exception, another exception occurred > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line > 237, in execut > time_partitioning=self.time_partitioning > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 951, in run_loa > return self.run_with_configuration(configuration > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 1003, in run_with_configuratio > err.resp.status > Exception: ('BigQuery job status check failed. Final error was: %s', 404 > {code} > The task failed to find a job {color:#ff}fmy-project:job_abc123{color} > but the correct job id is{color:#ff} > my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, > not actual id.) > I suppose the operator does not treat zone properly. > > {code:java} > $ bq show -j my-project:asia-northeast1:job_abc123 > Job my-project:asia-northeast1:job_abc123 > Job Type State Start Time Duration User Email Bytes Processed Bytes Billed > Billing Tier Labels > -- - - -- > -- > - -- -- > load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3561) Improve some views
[ https://issues.apache.org/jira/browse/AIRFLOW-3561?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729401#comment-16729401 ] ASF GitHub Bot commented on AIRFLOW-3561: - Fokko commented on pull request #4368: [AIRFLOW-3561] Improve queries URL: https://github.com/apache/incubator-airflow/pull/4368 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Improve some views > -- > > Key: AIRFLOW-3561 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3561 > Project: Apache Airflow > Issue Type: Improvement > Components: webserver >Reporter: Peter van 't Hof >Assignee: Peter van 't Hof >Priority: Minor > > Some views does interaction with the dag bag while is not needed for the > query. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #4368: [AIRFLOW-3561] Improve queries
Fokko closed pull request #4368: [AIRFLOW-3561] Improve queries URL: https://github.com/apache/incubator-airflow/pull/4368 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/migrations/versions/c8ffec048a3b_add_fields_to_dag.py b/airflow/migrations/versions/c8ffec048a3b_add_fields_to_dag.py new file mode 100644 index 00..a74e0b50a4 --- /dev/null +++ b/airflow/migrations/versions/c8ffec048a3b_add_fields_to_dag.py @@ -0,0 +1,44 @@ +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +"""add fields to dag + +Revision ID: c8ffec048a3b +Revises: 41f5f12752f8 +Create Date: 2018-12-23 21:55:46.463634 + +""" + +# revision identifiers, used by Alembic. +revision = 'c8ffec048a3b' +down_revision = '41f5f12752f8' +branch_labels = None +depends_on = None + +from alembic import op +import sqlalchemy as sa + + +def upgrade(): +op.add_column('dag', sa.Column('description', sa.Text(), nullable=True)) +op.add_column('dag', sa.Column('default_view', sa.String(25), nullable=True)) + + +def downgrade(): +op.drop_column('dag', 'description') +op.drop_column('dag', 'default_view') diff --git a/airflow/models/__init__.py b/airflow/models/__init__.py index aa93f5cb88..4bff05b721 100755 --- a/airflow/models/__init__.py +++ b/airflow/models/__init__.py @@ -2999,15 +2999,29 @@ class DagModel(Base): fileloc = Column(String(2000)) # String representing the owners owners = Column(String(2000)) +# Description of the dag +description = Column(Text) +# Default view of the inside the webserver +default_view = Column(String(25)) def __repr__(self): return "".format(self=self) +@property +def timezone(self): +return settings.TIMEZONE + @classmethod @provide_session def get_current(cls, dag_id, session=None): return session.query(cls).filter(cls.dag_id == dag_id).first() +def get_default_view(self): +if self.default_view is None: +return configuration.conf.get('webserver', 'dag_default_view').lower() +else: +return self.default_view + @functools.total_ordering class DAG(BaseDag, LoggingMixin): @@ -3109,7 +3123,7 @@ def __init__( 'core', 'max_active_runs_per_dag'), dagrun_timeout=None, sla_miss_callback=None, -default_view=configuration.conf.get('webserver', 'dag_default_view').lower(), +default_view=None, orientation=configuration.conf.get('webserver', 'dag_orientation'), catchup=configuration.conf.getboolean('scheduler', 'catchup_by_default'), on_success_callback=None, on_failure_callback=None, @@ -3180,7 +3194,7 @@ def __init__( self.max_active_runs = max_active_runs self.dagrun_timeout = dagrun_timeout self.sla_miss_callback = sla_miss_callback -self.default_view = default_view +self._default_view = default_view self.orientation = orientation self.catchup = catchup self.is_subdag = False # DagBag.bag_dag() will set this to True if appropriate @@ -3249,6 +3263,13 @@ def __exit__(self, _type, _value, _tb): # /Context Manager -- +def get_default_view(self): +"""This is only there for backward compatible jinja2 templates""" +if self._default_view is None: +return configuration.conf.get('webserver', 'dag_default_view').lower() +else: +return self._default_view + def date_range(self, start_date, num=None, end_date=timezone.utcnow()): if num: end_date = None @@ -4201,6 +4222,8 @@ def sync_to_db(self, owner=None, sync_time=None, session=None): orm_dag.owners = owner orm_dag.is_active = True orm_dag.last_scheduler_run = sync_time +orm_dag.default_view = self._default_view +orm_dag.description = self.description session.merge(orm_dag)
[jira] [Comment Edited] (AIRFLOW-3568) S3ToGoogleCloudStorageOperator failed after succeeding in copying files from s3
[ https://issues.apache.org/jira/browse/AIRFLOW-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729400#comment-16729400 ] jack edited comment on AIRFLOW-3568 at 12/27/18 7:31 AM: - [~yohei] No problem. note that it was approved but not yet merged. It might be few more days as maybe other committers will have more comments. You are more than welcome to check the Jira for more issues to fix. was (Author: jackjack10): [~yohei] No problem. note that it was approved but not yet merged. It might be few more days as maybe other committers will have more comments. You are more than welcome to search more issues with the gcp label to fix: https://issues.apache.org/jira/browse/AIRFLOW-3503?jql=project%20%3D%20AIRFLOW%20AND%20labels%20%3D%20gcp > S3ToGoogleCloudStorageOperator failed after succeeding in copying files from > s3 > --- > > Key: AIRFLOW-3568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3568 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I tried to copy files from s3 to gcs using > S3ToGoogleCloudStorageOperator. The file successfully was uploaded to GCS but > the task failed with the following error. > {code:java} > [2018-12-26 07:56:33,062] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,062] {discovery.py:871} INFO - > URL being requested: POST > https://www.googleapis.com/upload/storage/v1/b/stg-rfid-etl-tmp/o?name=rfid_wh%2Fuq%2Fjp%2Fno_resp_carton_1D%2F2018%2F12%2F24%2F21%2Fno_resp_carton_20181224210201.csv=json=media > [2018-12-26 07:56:33,214] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,213] {s3_to_gcs_operator.py:177} > INFO - All done, uploaded 1 files to Google Cloud Storage > [2018-12-26 07:56:33,217] {models.py:1736} ERROR - Object of type 'set' is > not JSON serializable > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1637, in _run_raw_tas > self.xcom_push(key=XCOM_RETURN_KEY, value=result > File "/usr/local/lib/airflow/airflow/models.py", line 1983, in xcom_pus > execution_date=execution_date or self.execution_date > File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrappe > return func(*args, **kwargs > File "/usr/local/lib/airflow/airflow/models.py", line 4531, in se > value = json.dumps(value).encode('UTF-8' > File "/usr/local/lib/python3.6/json/__init__.py", line 231, in dump > return _default_encoder.encode(obj > File "/usr/local/lib/python3.6/json/encoder.py", line 199, in encod > chunks = self.iterencode(o, _one_shot=True > File "/usr/local/lib/python3.6/json/encoder.py", line 257, in iterencod > return _iterencode(o, 0 > File "/usr/local/lib/python3.6/json/encoder.py", line 180, in defaul > o.__class__.__name__ > TypeError: Object of type 'set' is not JSON serializabl > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,217] {models.py:1736} ERROR - > Object of type 'set' is not JSON serializable > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 Traceback (most recent call last): > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1637, in _run_raw_task > [2018-12-26 07:56:33,220] {models.py:1756} INFO - Marking task as UP_FOR_RETRY > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 self.xcom_push(key=XCOM_RETURN_KEY, value=result) > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1983, in xcom_push > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 execution_date=execution_date or > self.execution_date) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/utils/db.py", > line 74, in wrapper > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return func(*args, **kwargs) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 4531, in set > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 value =
[jira] [Commented] (AIRFLOW-3568) S3ToGoogleCloudStorageOperator failed after succeeding in copying files from s3
[ https://issues.apache.org/jira/browse/AIRFLOW-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729400#comment-16729400 ] jack commented on AIRFLOW-3568: --- [~yohei] No problem. note that it was approved but not yet merged. It might be few more days as maybe other committers will have more comments. You are more than welcome to search more issues with the gcp label to fix: https://issues.apache.org/jira/browse/AIRFLOW-3503?jql=project%20%3D%20AIRFLOW%20AND%20labels%20%3D%20gcp > S3ToGoogleCloudStorageOperator failed after succeeding in copying files from > s3 > --- > > Key: AIRFLOW-3568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3568 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I tried to copy files from s3 to gcs using > S3ToGoogleCloudStorageOperator. The file successfully was uploaded to GCS but > the task failed with the following error. > {code:java} > [2018-12-26 07:56:33,062] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,062] {discovery.py:871} INFO - > URL being requested: POST > https://www.googleapis.com/upload/storage/v1/b/stg-rfid-etl-tmp/o?name=rfid_wh%2Fuq%2Fjp%2Fno_resp_carton_1D%2F2018%2F12%2F24%2F21%2Fno_resp_carton_20181224210201.csv=json=media > [2018-12-26 07:56:33,214] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,213] {s3_to_gcs_operator.py:177} > INFO - All done, uploaded 1 files to Google Cloud Storage > [2018-12-26 07:56:33,217] {models.py:1736} ERROR - Object of type 'set' is > not JSON serializable > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1637, in _run_raw_tas > self.xcom_push(key=XCOM_RETURN_KEY, value=result > File "/usr/local/lib/airflow/airflow/models.py", line 1983, in xcom_pus > execution_date=execution_date or self.execution_date > File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrappe > return func(*args, **kwargs > File "/usr/local/lib/airflow/airflow/models.py", line 4531, in se > value = json.dumps(value).encode('UTF-8' > File "/usr/local/lib/python3.6/json/__init__.py", line 231, in dump > return _default_encoder.encode(obj > File "/usr/local/lib/python3.6/json/encoder.py", line 199, in encod > chunks = self.iterencode(o, _one_shot=True > File "/usr/local/lib/python3.6/json/encoder.py", line 257, in iterencod > return _iterencode(o, 0 > File "/usr/local/lib/python3.6/json/encoder.py", line 180, in defaul > o.__class__.__name__ > TypeError: Object of type 'set' is not JSON serializabl > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,217] {models.py:1736} ERROR - > Object of type 'set' is not JSON serializable > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 Traceback (most recent call last): > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1637, in _run_raw_task > [2018-12-26 07:56:33,220] {models.py:1756} INFO - Marking task as UP_FOR_RETRY > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 self.xcom_push(key=XCOM_RETURN_KEY, value=result) > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1983, in xcom_push > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 execution_date=execution_date or > self.execution_date) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/utils/db.py", > line 74, in wrapper > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return func(*args, **kwargs) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 4531, in set > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 value = json.dumps(value).encode('UTF-8') > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/__init__.py", > line 231, in dumps > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3
[jira] [Assigned] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yohei Onishi reassigned AIRFLOW-3571: - Assignee: Yohei Onishi > GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS > to BiqQuery but a task is failed > - > > Key: AIRFLOW-3571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I am using the following service in asia-northeast1-c zone. * GCS: > asia-northeast1-c > * BigQuery dataset and table: asia-northeast1-c > * Composer: asia-northeast1-c > My task created by GoogleCloudStorageToBigQueryOperator succeeded to > uploading CSV file from a GCS bucket to a BigQuery table but the task was > failed due to the following error. > > {code:java} > [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask > bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] > {discovery.py:871} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status > check failed. Final error was: %s', 404) > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 981, in run_with_configuratio > jobId=self.running_job_id).execute( > File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", > line 130, in positional_wrappe > return wrapped(*args, **kwargs > File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line > 851, in execut > raise HttpError(resp, content, uri=self.uri > googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > returned "Not found: Job my-project:job_abc123" > During handling of the above exception, another exception occurred > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line > 237, in execut > time_partitioning=self.time_partitioning > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 951, in run_loa > return self.run_with_configuration(configuration > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 1003, in run_with_configuratio > err.resp.status > Exception: ('BigQuery job status check failed. Final error was: %s', 404 > {code} > The task failed to find a job {color:#ff}fmy-project:job_abc123{color} > but the correct job id is{color:#ff} > my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, > not actual id.) > I suppose the operator does not treat zone properly. > > {code:java} > $ bq show -j my-project:asia-northeast1:job_abc123 > Job my-project:asia-northeast1:job_abc123 > Job Type State Start Time Duration User Email Bytes Processed Bytes Billed > Billing Tier Labels > -- - - -- > -- > - -- -- > load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Commented] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729398#comment-16729398 ] jack commented on AIRFLOW-3571: --- [~yohei] There is an open PR for adding support for locations to the BigQueryHook https://github.com/apache/incubator-airflow/pull/4324 Once it's merged (I assume it would be soon) you are welcome to open PRs for extending the Operators to support it. > GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS > to BiqQuery but a task is failed > - > > Key: AIRFLOW-3571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Priority: Major > > I am using the following service in asia-northeast1-c zone. * GCS: > asia-northeast1-c > * BigQuery dataset and table: asia-northeast1-c > * Composer: asia-northeast1-c > My task created by GoogleCloudStorageToBigQueryOperator succeeded to > uploading CSV file from a GCS bucket to a BigQuery table but the task was > failed due to the following error. > > {code:java} > [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask > bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] > {discovery.py:871} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status > check failed. Final error was: %s', 404) > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 981, in run_with_configuratio > jobId=self.running_job_id).execute( > File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", > line 130, in positional_wrappe > return wrapped(*args, **kwargs > File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line > 851, in execut > raise HttpError(resp, content, uri=self.uri > googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > returned "Not found: Job my-project:job_abc123" > During handling of the above exception, another exception occurred > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line > 237, in execut > time_partitioning=self.time_partitioning > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 951, in run_loa > return self.run_with_configuration(configuration > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 1003, in run_with_configuratio > err.resp.status > Exception: ('BigQuery job status check failed. Final error was: %s', 404 > {code} > The task failed to find a job {color:#ff}fmy-project:job_abc123{color} > but the correct job id is{color:#ff} > my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, > not actual id.) > I suppose the operator does not treat zone properly. > > {code:java} > $ bq show -j my-project:asia-northeast1:job_abc123 > Job my-project:asia-northeast1:job_abc123 > Job Type State Start Time Duration User Email Bytes Processed Bytes Billed > Billing Tier Labels > -- - - -- > -- > - -- -- > load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3572) Docs implies that providing remote_log_conn_id is mandatory
[ https://issues.apache.org/jira/browse/AIRFLOW-3572?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Victor Villas Bôas Chaves updated AIRFLOW-3572: --- Description: The docs on setting up remote logging with S3 currently states: > To enable this feature, airflow.cfg must be configured as in this example: ... example ... > In the above example, Airflow will try to use {{S3Hook('MyS3Conn')}}. When in fact you can just leave `remote_log_conn_id =` (or nothing, because that's the default) if you have your credentials set in the process environment, because Airflow will fallback to boto3 config auto-detection. This behavior is by design as the comment hints [here|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/aws_hook.py#L155-L158] so it should be documented. was: The docs on setting up remote logging with S3 currently states: > To enable this feature, airflow.cfg must be configured as in this example: ... example ... > In the above example, Airflow will try to use {{S3Hook('MyS3Conn')}}. When in fact you can just leave `remote_log_conn_id =` (or nothing, because that's the default) if you have your credentials set in the process environment, because Airflow will fallback to boto3 config auto-detection. This behavior is by purpose as the comment hints [here|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/aws_hook.py#L155-L158] so it should be documented. > Docs implies that providing remote_log_conn_id is mandatory > --- > > Key: AIRFLOW-3572 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3572 > Project: Apache Airflow > Issue Type: Improvement > Components: Documentation >Affects Versions: 1.10.1 >Reporter: Victor Villas Bôas Chaves >Priority: Minor > > The docs on setting up remote logging with S3 currently states: > > To enable this feature, airflow.cfg must be configured as in this example: > ... example ... > > In the above example, Airflow will try to use {{S3Hook('MyS3Conn')}}. > When in fact you can just leave `remote_log_conn_id =` (or nothing, because > that's the default) if you have your credentials set in the process > environment, because Airflow will fallback to boto3 config auto-detection. > This behavior is by design as the comment hints > [here|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/aws_hook.py#L155-L158] > so it should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3572) Docs implies that providing remote_log_conn_id is mandatory
Victor Villas Bôas Chaves created AIRFLOW-3572: -- Summary: Docs implies that providing remote_log_conn_id is mandatory Key: AIRFLOW-3572 URL: https://issues.apache.org/jira/browse/AIRFLOW-3572 Project: Apache Airflow Issue Type: Improvement Components: Documentation Affects Versions: 1.10.1 Reporter: Victor Villas Bôas Chaves The docs on setting up remote logging with S3 currently states: > To enable this feature, airflow.cfg must be configured as in this example: ... example ... > In the above example, Airflow will try to use {{S3Hook('MyS3Conn')}}. When in fact you can just leave `remote_log_conn_id =` (or nothing, because that's the default) if you have your credentials set in the process environment, because Airflow will fallback to boto3 config auto-detection. This behavior is by purpose as the comment hints [here|https://github.com/apache/incubator-airflow/blob/master/airflow/contrib/hooks/aws_hook.py#L155-L158] so it should be documented. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp commented on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator.
jmcarp commented on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator. URL: https://github.com/apache/incubator-airflow/pull/4331#issuecomment-450062981 Updated! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3568) S3ToGoogleCloudStorageOperator failed after succeeding in copying files from s3
[ https://issues.apache.org/jira/browse/AIRFLOW-3568?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729281#comment-16729281 ] Yohei Onishi commented on AIRFLOW-3568: --- [~jackjack10] My PR is approved. Thank you for your support. > S3ToGoogleCloudStorageOperator failed after succeeding in copying files from > s3 > --- > > Key: AIRFLOW-3568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3568 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Assignee: Yohei Onishi >Priority: Major > > I tried to copy files from s3 to gcs using > S3ToGoogleCloudStorageOperator. The file successfully was uploaded to GCS but > the task failed with the following error. > {code:java} > [2018-12-26 07:56:33,062] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,062] {discovery.py:871} INFO - > URL being requested: POST > https://www.googleapis.com/upload/storage/v1/b/stg-rfid-etl-tmp/o?name=rfid_wh%2Fuq%2Fjp%2Fno_resp_carton_1D%2F2018%2F12%2F24%2F21%2Fno_resp_carton_20181224210201.csv=json=media > [2018-12-26 07:56:33,214] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,213] {s3_to_gcs_operator.py:177} > INFO - All done, uploaded 1 files to Google Cloud Storage > [2018-12-26 07:56:33,217] {models.py:1736} ERROR - Object of type 'set' is > not JSON serializable > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1637, in _run_raw_tas > self.xcom_push(key=XCOM_RETURN_KEY, value=result > File "/usr/local/lib/airflow/airflow/models.py", line 1983, in xcom_pus > execution_date=execution_date or self.execution_date > File "/usr/local/lib/airflow/airflow/utils/db.py", line 74, in wrappe > return func(*args, **kwargs > File "/usr/local/lib/airflow/airflow/models.py", line 4531, in se > value = json.dumps(value).encode('UTF-8' > File "/usr/local/lib/python3.6/json/__init__.py", line 231, in dump > return _default_encoder.encode(obj > File "/usr/local/lib/python3.6/json/encoder.py", line 199, in encod > chunks = self.iterencode(o, _one_shot=True > File "/usr/local/lib/python3.6/json/encoder.py", line 257, in iterencod > return _iterencode(o, 0 > File "/usr/local/lib/python3.6/json/encoder.py", line 180, in defaul > o.__class__.__name__ > TypeError: Object of type 'set' is not JSON serializabl > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 [2018-12-26 07:56:33,217] {models.py:1736} ERROR - > Object of type 'set' is not JSON serializable > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 Traceback (most recent call last): > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1637, in _run_raw_task > [2018-12-26 07:56:33,220] {models.py:1756} INFO - Marking task as UP_FOR_RETRY > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 self.xcom_push(key=XCOM_RETURN_KEY, value=result) > [2018-12-26 07:56:33,220] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 1983, in xcom_push > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 execution_date=execution_date or > self.execution_date) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/utils/db.py", > line 74, in wrapper > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return func(*args, **kwargs) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/airflow/airflow/models.py", > line 4531, in set > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 value = json.dumps(value).encode('UTF-8') > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/__init__.py", > line 231, in dumps > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 return _default_encoder.encode(obj) > [2018-12-26 07:56:33,221] {base_task_runner.py:107} INFO - Job 39: Subtask > gcs_copy_files_from_s3 File "/usr/local/lib/python3.6/json/encoder.py", > line 199, in encode > [2018-12-26 07:56:33,222]
[GitHub] kaxil commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
kaxil commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244072798 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: Yup, yup. You are right This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244072388 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: If you think `xcom` should support `set`, this is a bug on `xcom` since the current implementation uses `json.dumps` and it only supports JSON serializable ( `set` is not JSON serializable). ``` >>> list = [1,2,3,1] >>> json.dumps(list).encode('UTF-8') b'[1, 2, 3, 1]' >>> s = set(list) >>> json.dumps(s).encode('UTF-8') Traceback (most recent call last): File "", line 1, in File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type set is not JSON serializable ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244072388 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: If you think `xcom` should support `set`, this is a bug on `xcom` since the current implementation only supports JSON serializable only ( `set` is not JSON serializable). ``` >>> list = [1,2,3,1] >>> json.dumps(list).encode('UTF-8') b'[1, 2, 3, 1]' >>> s = set(list) >>> json.dumps(s).encode('UTF-8') Traceback (most recent call last): File "", line 1, in File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/__init__.py", line 231, in dumps return _default_encoder.encode(obj) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 199, in encode chunks = self.iterencode(o, _one_shot=True) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 257, in iterencode return _iterencode(o, 0) File "/Users/01087872/.pyenv/versions/3.7.1/lib/python3.7/json/encoder.py", line 179, in default raise TypeError(f'Object of type {o.__class__.__name__} ' TypeError: Object of type set is not JSON serializable ``` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244071926 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: added traceback on the description. According to the traceback model.py internally uses `json.dumps(value).encode` but it does not support set since set is not JSON serializable This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator.
codecov-io edited a comment on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator. URL: https://github.com/apache/incubator-airflow/pull/4331#issuecomment-447721611 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=h1) Report > Merging [#4331](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/319a659dda9d50ca8c1596904315c9664fc74720?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4331/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4331 +/- ## == + Coverage 78.15% 78.15% +<.01% == Files 204 204 Lines 1649916499 == + Hits1289412895 +1 + Misses 3605 3604 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4331/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.81% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=footer). Last update [319a659...37a6ee5](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244071926 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: added traceback on the description. According to the traceback model.py internally uses json.dumps(value).encode but it does not support set since set is not JSON serializable This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator.
codecov-io edited a comment on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator. URL: https://github.com/apache/incubator-airflow/pull/4331#issuecomment-447721611 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=h1) Report > Merging [#4331](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/319a659dda9d50ca8c1596904315c9664fc74720?src=pr=desc) will **increase** coverage by `<.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4331/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4331 +/- ## == + Coverage 78.15% 78.15% +<.01% == Files 204 204 Lines 1649916499 == + Hits1289412895 +1 + Misses 3605 3604 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4331/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.81% <0%> (+0.04%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=footer). Last update [319a659...37a6ee5](https://codecov.io/gh/apache/incubator-airflow/pull/4331?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
yohei1126 commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244070449 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: Actually the error log says it does not support `set`: ` TypeError: 'NoneType' object is not iterable.` but it works with list. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729263#comment-16729263 ] Yohei Onishi commented on AIRFLOW-3571: --- It seems {color:#d04437}GoogleCloudStorageToBigQueryOperator{color} does not support regions other than US / EU. > GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS > to BiqQuery but a task is failed > - > > Key: AIRFLOW-3571 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 > Project: Apache Airflow > Issue Type: Bug > Components: contrib >Affects Versions: 1.10.0 >Reporter: Yohei Onishi >Priority: Major > > I am using the following service in asia-northeast1-c zone. * GCS: > asia-northeast1-c > * BigQuery dataset and table: asia-northeast1-c > * Composer: asia-northeast1-c > My task created by GoogleCloudStorageToBigQueryOperator succeeded to > uploading CSV file from a GCS bucket to a BigQuery table but the task was > failed due to the following error. > > {code:java} > [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask > bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] > {discovery.py:871} INFO - URL being requested: GET > https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status > check failed. Final error was: %s', 404) > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 981, in run_with_configuratio > jobId=self.running_job_id).execute( > File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", > line 130, in positional_wrappe > return wrapped(*args, **kwargs > File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line > 851, in execut > raise HttpError(resp, content, uri=self.uri > googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json > returned "Not found: Job my-project:job_abc123" > During handling of the above exception, another exception occurred > Traceback (most recent call last) > File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas > result = task_copy.execute(context=context > File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line > 237, in execut > time_partitioning=self.time_partitioning > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 951, in run_loa > return self.run_with_configuration(configuration > File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line > 1003, in run_with_configuratio > err.resp.status > Exception: ('BigQuery job status check failed. Final error was: %s', 404 > {code} > The task failed to find a job {color:#ff}fmy-project:job_abc123{color} > but the correct job id is{color:#ff} > my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, > not actual id.) > I suppose the operator does not treat zone properly. > > {code:java} > $ bq show -j my-project:asia-northeast1:job_abc123 > Job my-project:asia-northeast1:job_abc123 > Job Type State Start Time Duration User Email Bytes Processed Bytes Billed > Billing Tier Labels > -- - - -- > -- > - -- -- > load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email > {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
kaxil commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244067381 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -41,11 +42,12 @@ def __init__(self, def get_client(self, project_id): # type: (str) -> Client """ -Provides a client for interacting with Cloud Spanner API. +Provides a client for interacting with the Cloud Spanner API. -:param project_id: The ID of the project which owns the instances, tables and data. +:param project_id: TThe ID of the GCP project that owns the Cloud Spanner Review comment: looks like a typo. ```suggestion :param project_id: The ID of the GCP project that owns the Cloud Spanner ``` I haven't reviewed entire PR yet!! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator
kaxil commented on a change in pull request #4371: [AIRFLOW-2939][AIRFLOW-3568] fix TypeError on GoogleCloudStorageToS3Operator / S3ToGoogleCloudStorageOperator URL: https://github.com/apache/incubator-airflow/pull/4371#discussion_r244067118 ## File path: airflow/contrib/operators/gcs_to_s3.py ## @@ -101,7 +101,7 @@ def execute(self, context): # Google Cloud Storage and not in S3 bucket_name, _ = S3Hook.parse_s3_url(self.dest_s3_key) existing_files = s3_hook.list_keys(bucket_name) -files = set(files) - set(existing_files) +files = list(set(files) - set(existing_files)) Review comment: not sure on this. `xcom` push should be able to support any pickleable object like list, set This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on issue #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type
jmcarp commented on issue #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256#issuecomment-450049094 I left a few minor questions on the diffs, but this looks good to me. cc @kaxil or @ashb to take a look when you have time. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook.
kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook. URL: https://github.com/apache/incubator-airflow/pull/4362#discussion_r244066305 ## File path: airflow/contrib/hooks/datadog_hook.py ## @@ -76,12 +65,18 @@ def send_metric(self, metric_name, datapoint, tags=None): :type datapoint: int or float :param tags: A list of tags associated with the metric :type tags: list +:param type: Type of your metric either: gauge, rate, or count +:type type: string Review comment: `string` -> `str` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook.
kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook. URL: https://github.com/apache/incubator-airflow/pull/4362#discussion_r244066287 ## File path: airflow/contrib/hooks/datadog_hook.py ## @@ -47,26 +47,15 @@ def __init__(self, datadog_conn_id='datadog_default'): # for all metric submissions. self.host = conn.host -if self.api_key is None: -raise AirflowException("api_key must be specified in the " - "Datadog connection details") -if self.app_key is None: -raise AirflowException("app_key must be specified in the " - "Datadog connection details") - self.log.info("Setting up api keys for Datadog") -options = { -'api_key': self.api_key, -'app_key': self.app_key -} -initialize(**options) +initialize(api_key=self.api_key, app_key=self.app_key) def validate_response(self, response): if response['status'] != 'ok': self.log.error("Datadog returned: %s", response) raise AirflowException("Error status received from Datadog") -def send_metric(self, metric_name, datapoint, tags=None): +def send_metric(self, metric_name, datapoint, tags=None, type_=None, interval=None): Review comment: typo .. `type_` -> `type` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type
jmcarp commented on a change in pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256#discussion_r244066490 ## File path: tests/contrib/operators/test_ecs_operator.py ## @@ -133,6 +134,38 @@ def test_execute_without_failures(self, check_mock, wait_mock): self.assertEqual(self.ecs.arn, 'arn:aws:ecs:us-east-1:012345678910:task/d8c67b3c-ac87-4ffe-a847-4785bc3a8b55') +@mock.patch.object(ECSOperator, '_wait_for_task_ended') +@mock.patch.object(ECSOperator, '_check_success_task') +def test_fargate_run(self, check_mock, wait_mock): Review comment: I think this test is almost the same as `test_execute_without_failures`. Could we use parameterized tests instead of writing two very similar tests, or would that make the code more confusing? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook.
kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook. URL: https://github.com/apache/incubator-airflow/pull/4362#discussion_r244066383 ## File path: airflow/contrib/hooks/datadog_hook.py ## @@ -121,21 +117,37 @@ def post_event(self, title, text, tags=None, alert_type=None, aggregation_key=No :type title: str :param text: The body of the event (more information) :type text: str -:param tags: List of string tags to apply to the event -:type tags: list +:param aggregation_key: Key that can be used to aggregate this event in a stream +:type aggregation_key: str :param alert_type: The alert type for the event, one of ["error", "warning", "info", "success"] :type alert_type: str -:param aggregation_key: Key that can be used to aggregate this event in a stream -:type aggregation_key: str +:date_happened: POSIX timestamp of the event; defaults to now Review comment: `:type date_happened:` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook.
kaxil commented on a change in pull request #4362: [AIRFLOW-3559] Add missing options to DatadogHook. URL: https://github.com/apache/incubator-airflow/pull/4362#discussion_r244066443 ## File path: airflow/contrib/hooks/datadog_hook.py ## @@ -47,26 +47,15 @@ def __init__(self, datadog_conn_id='datadog_default'): # for all metric submissions. self.host = conn.host -if self.api_key is None: -raise AirflowException("api_key must be specified in the " - "Datadog connection details") -if self.app_key is None: -raise AirflowException("app_key must be specified in the " - "Datadog connection details") - self.log.info("Setting up api keys for Datadog") -options = { -'api_key': self.api_key, -'app_key': self.app_key -} -initialize(**options) +initialize(api_key=self.api_key, app_key=self.app_key) def validate_response(self, response): if response['status'] != 'ok': self.log.error("Datadog returned: %s", response) raise AirflowException("Error status received from Datadog") -def send_metric(self, metric_name, datapoint, tags=None): +def send_metric(self, metric_name, datapoint, tags=None, type_=None, interval=None): Review comment: or may be change the name to `metric_type` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type
jmcarp commented on a change in pull request #4256: [AIRFLOW-3417] Use the platformVersion only for the FARGATE launch type URL: https://github.com/apache/incubator-airflow/pull/4256#discussion_r244066387 ## File path: tests/contrib/operators/test_ecs_operator.py ## @@ -63,26 +65,26 @@ def setUp(self, aws_hook_mock): configuration.load_test_config() self.aws_hook_mock = aws_hook_mock -self.ecs = ECSOperator( -task_id='task', -task_definition='t', -cluster='c', -overrides={}, -aws_conn_id=None, -region_name='eu-west-1', -group='group', -placement_constraints=[ -{ -'expression': 'attribute:ecs.instance-type =~ t2.*', -'type': 'memberOf' -} -], -network_configuration={ +self.ecs_operator_args = { +'task_id': 'task', +'task_definition': 't', +'cluster': 'c', +'overrides': {}, +'aws_conn_id': None, +'region_name': 'eu-west-1', +'group': 'group', +'placement_constraints': [{ +'expression': 'attribute:ecs.instance-type =~ t2.*', +'type': 'memberOf' +}], +'network_configuration': { 'awsvpcConfiguration': { 'securityGroups': ['sg-123abc'] } } -) +} +self.ecs = ECSOperator(**self.ecs_operator_args) +self.ecs_fargate = ECSOperator(launch_type='FARGATE', **self.ecs_operator_args) Review comment: Do we need to instantiate a new fargate operator before each test if only one test uses it? What do you think about setting `self.launch_type` in the relevant tests instead? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Created] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
Yohei Onishi created AIRFLOW-3571: - Summary: GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed Key: AIRFLOW-3571 URL: https://issues.apache.org/jira/browse/AIRFLOW-3571 Project: Apache Airflow Issue Type: Bug Components: contrib Affects Versions: 1.10.0 Reporter: Yohei Onishi I am using the following service in asia-northeast1-c zone. * GCS: asia-northeast1-c * BigQuery dataset and table: asia-northeast1-c * Composer: asia-northeast1-c My task created by GoogleCloudStorageToBigQueryOperator succeeded to uploading CSV file from a GCS bucket to a BigQuery table but the task was failed due to the following error. {code:java} [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] {discovery.py:871} INFO - URL being requested: GET https://www.googleapis.com/bigquery/v2/projects/fr-stg-datalake/jobs/job_QQE9TDEu88mfdw_fJHHEo9FtjXja?alt=json [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404) Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio jobId=self.running_job_id).execute( File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe return wrapped(*args, **kwargs File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut raise HttpError(resp, content, uri=self.uri googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json returned "Not found: Job my-project:job_abc123" During handling of the above exception, another exception occurred Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas result = task_copy.execute(context=context File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line 237, in execut time_partitioning=self.time_partitioning File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 951, in run_loa return self.run_with_configuration(configuration File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1003, in run_with_configuratio err.resp.status Exception: ('BigQuery job status check failed. Final error was: %s', 404 {code} The task failed to find a job {color:#FF}fmy-project:job_abc123{color} but the correct job id is{color:#FF} my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, not actual id.) I suppose the operator does not treat zone properly. {code:java} $ bq show -j my-project:asia-northeast1:job_abc123 Job my-project:asia-northeast1:job_abc123 Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels -- - - -- -- - -- -- load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email {code} -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] kaxil commented on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator.
kaxil commented on issue #4331: [AIRFLOW-3531] Add gcs to gcs transfer operator. URL: https://github.com/apache/incubator-airflow/pull/4331#issuecomment-450048479 Any updates on this PR @jmcarp This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #4359: [AIRFLOW-3150] Make execution_date templated in TriggerDagRunOperator
kaxil commented on issue #4359: [AIRFLOW-3150] Make execution_date templated in TriggerDagRunOperator URL: https://github.com/apache/incubator-airflow/pull/4359#issuecomment-450048262 @Fokko Any opinion? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed
kaxil commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#issuecomment-450048020 It still has some conflicts, can you resolve them :) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3571) GoogleCloudStorageToBigQueryOperator succeeds to uploading CSV file from GCS to BiqQuery but a task is failed
[ https://issues.apache.org/jira/browse/AIRFLOW-3571?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Yohei Onishi updated AIRFLOW-3571: -- Description: I am using the following service in asia-northeast1-c zone. * GCS: asia-northeast1-c * BigQuery dataset and table: asia-northeast1-c * Composer: asia-northeast1-c My task created by GoogleCloudStorageToBigQueryOperator succeeded to uploading CSV file from a GCS bucket to a BigQuery table but the task was failed due to the following error. {code:java} [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] {discovery.py:871} INFO - URL being requested: GET https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404) Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio jobId=self.running_job_id).execute( File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe return wrapped(*args, **kwargs File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut raise HttpError(resp, content, uri=self.uri googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json returned "Not found: Job my-project:job_abc123" During handling of the above exception, another exception occurred Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas result = task_copy.execute(context=context File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line 237, in execut time_partitioning=self.time_partitioning File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 951, in run_loa return self.run_with_configuration(configuration File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 1003, in run_with_configuratio err.resp.status Exception: ('BigQuery job status check failed. Final error was: %s', 404 {code} The task failed to find a job {color:#ff}fmy-project:job_abc123{color} but the correct job id is{color:#ff} my-project:asia-northeast1:job_abc123{color}. (Note: this is just an example, not actual id.) I suppose the operator does not treat zone properly. {code:java} $ bq show -j my-project:asia-northeast1:job_abc123 Job my-project:asia-northeast1:job_abc123 Job Type State Start Time Duration User Email Bytes Processed Bytes Billed Billing Tier Labels -- - - -- -- - -- -- load SUCCESS 27 Dec 05:35:47 0:00:01 my-service-account-id-email {code} was: I am using the following service in asia-northeast1-c zone. * GCS: asia-northeast1-c * BigQuery dataset and table: asia-northeast1-c * Composer: asia-northeast1-c My task created by GoogleCloudStorageToBigQueryOperator succeeded to uploading CSV file from a GCS bucket to a BigQuery table but the task was failed due to the following error. {code:java} [2018-12-26 21:35:47,464] {base_task_runner.py:107} INFO - Job 146: Subtask bq_load_data_into_dest_table_from_gcs [2018-12-26 21:35:47,464] {discovery.py:871} INFO - URL being requested: GET https://www.googleapis.com/bigquery/v2/projects/fr-stg-datalake/jobs/job_QQE9TDEu88mfdw_fJHHEo9FtjXja?alt=json [2018-12-26 21:35:47,931] {models.py:1736} ERROR - ('BigQuery job status check failed. Final error was: %s', 404) Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 981, in run_with_configuratio jobId=self.running_job_id).execute( File "/usr/local/lib/python3.6/site-packages/googleapiclient/_helpers.py", line 130, in positional_wrappe return wrapped(*args, **kwargs File "/usr/local/lib/python3.6/site-packages/googleapiclient/http.py", line 851, in execut raise HttpError(resp, content, uri=self.uri googleapiclient.errors.HttpError: https://www.googleapis.com/bigquery/v2/projects/my-project/jobs/job_abc123?alt=json returned "Not found: Job my-project:job_abc123" During handling of the above exception, another exception occurred Traceback (most recent call last) File "/usr/local/lib/airflow/airflow/models.py", line 1633, in _run_raw_tas result = task_copy.execute(context=context File "/usr/local/lib/airflow/airflow/contrib/operators/gcs_to_bq.py", line 237, in execut time_partitioning=self.time_partitioning File "/usr/local/lib/airflow/airflow/contrib/hooks/bigquery_hook.py", line 951, in run_loa return self.run_with_configuration(configuration File
[GitHub] r39132 closed pull request #4336: [AIRFLOW-XXX] Fix trivial grammar error in doc
r39132 closed pull request #4336: [AIRFLOW-XXX] Fix trivial grammar error in doc URL: https://github.com/apache/incubator-airflow/pull/4336 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/tutorial.rst b/docs/tutorial.rst index 69670d7b02..1b8847c4d6 100644 --- a/docs/tutorial.rst +++ b/docs/tutorial.rst @@ -390,10 +390,10 @@ Let's run a few commands to validate this script further. # print the list of active DAGs airflow list_dags -# prints the list of tasks the "tutorial" dag_id +# prints the list of tasks in the "tutorial" DAG airflow list_tasks tutorial -# prints the hierarchy of tasks in the tutorial DAG +# prints the hierarchy of tasks in the "tutorial" DAG airflow list_tasks tutorial --tree This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450040364 @odracci I sent you an email to set up a quick hangout to ask you more about this (will post call summary here). I want to see if this is a shortcoming of how our dev environment is set up. You should be able to just define hostpath as a volume mount the way you would any other volume mount. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] odracci commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
odracci commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450037129 @dimberman I implemented the hostpath because is the only way I found to develop DAGs when using kubernetes on the local environment. With docker-compose we usually share the `dags` folder, `hostpath` is the equivalent when kubernetes is being used. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] odracci commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
odracci commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#discussion_r244057766 ## File path: airflow/contrib/kubernetes/worker_configuration.py ## @@ -50,13 +55,13 @@ def _get_init_containers(self, volume_mounts): 'value': self.kube_config.git_branch }, { 'name': 'GIT_SYNC_ROOT', -'value': os.path.join( -self.worker_airflow_dags, -self.kube_config.git_subpath -) +'value': self.kube_config.git_sync_root }, { 'name': 'GIT_SYNC_DEST', -'value': 'dags' +'value': self.kube_config.git_sync_dest +}, { +'name': 'GIT_SYNC_DEPTH', Review comment: I'm sorry, I haven't seen that comment. `GIT_SYNC_DEPTH` just clone the repo with the last commit instead of downloading the whole repo history, it speeds up the deployment. The revision being in the path directory is how git-sync works and is not possible to remove it. I solved the problem with `kube_config.git_sync_root` which includes the dags_folder + revision + git_sub_path /cc @jzucker2 @dimberman This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450035982 This page describes the exact same thing I experienced: https://github.com/isaacs/github/issues/361. Doesn't sound like something that can be configured from the repository end. However, I think squashing commits by the Airflow committers would help prevent lots of force pushes. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#discussion_r244055880 ## File path: airflow/contrib/kubernetes/worker_configuration.py ## @@ -50,13 +55,13 @@ def _get_init_containers(self, volume_mounts): 'value': self.kube_config.git_branch }, { 'name': 'GIT_SYNC_ROOT', -'value': os.path.join( -self.worker_airflow_dags, -self.kube_config.git_subpath -) +'value': self.kube_config.git_sync_root }, { 'name': 'GIT_SYNC_DEST', -'value': 'dags' +'value': self.kube_config.git_sync_dest +}, { +'name': 'GIT_SYNC_DEPTH', Review comment: @odracci can you please address @jzucker2's question. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#discussion_r244055880 ## File path: airflow/contrib/kubernetes/worker_configuration.py ## @@ -50,13 +55,13 @@ def _get_init_containers(self, volume_mounts): 'value': self.kube_config.git_branch }, { 'name': 'GIT_SYNC_ROOT', -'value': os.path.join( -self.worker_airflow_dags, -self.kube_config.git_subpath -) +'value': self.kube_config.git_sync_root }, { 'name': 'GIT_SYNC_DEST', -'value': 'dags' +'value': self.kube_config.git_sync_dest +}, { +'name': 'GIT_SYNC_DEPTH', Review comment: @odracci ^^^ This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450035432 @odracci @Fokko overall this looks incredible. Thank you so much @odracci this actually addresses multiple things I was currently looking into (i.e. the web assets, the git path stuff). One question is still why do we need special features just for hostpath? Hostpath should just be a volume mount like any other, and frankly I don't really want to actively encourage people to use it (it's generally not advised and I think there's even been talk about deprecating it). Other than that LGTM :). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
dimberman commented on a change in pull request #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#discussion_r244055645 ## File path: scripts/ci/kubernetes/docker/build.sh ## @@ -34,8 +34,18 @@ fi echo "Airflow directory $AIRFLOW_ROOT" echo "Airflow Docker directory $DIRNAME" +if [[ ${PYTHON_VERSION} == '3' ]]; then + PYTHON_DOCKER_IMAGE=python:3.6-slim +else + PYTHON_DOCKER_IMAGE=python:2.7-slim +fi + cd $AIRFLOW_ROOT -python setup.py sdist -q +docker run -ti --rm -e SLUGIFY_USES_TEXT_UNIDECODE -v ${AIRFLOW_ROOT}:/airflow \ +-w /airflow ${PYTHON_DOCKER_IMAGE} ./scripts/ci/kubernetes/docker/compile.sh Review comment: This makes a lot of sense. Thanks for setting this up. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#issuecomment-449454398 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=h1) Report > Merging [#4353](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/fb52d7b7ca7f08ffdaaa31558eb864aed3fdc9da?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4353/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4353 +/- ## == - Coverage 78.15% 78.14% -0.02% == Files 204 202 -2 Lines 1649916486 -13 == - Hits1289512883 -12 + Misses 3604 3603 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (-0.05%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.48% <0%> (-0.05%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `77.39% <0%> (-0.03%)` | :arrow_down: | | [airflow/models/connection.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `88.6% <0%> (ø)` | :arrow_up: | | [airflow/models/dagpickle.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncGlja2xlLnB5) | | | | [airflow/models/base.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZS5weQ==) | | | | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `32.53% <0%> (+0.25%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=footer). Last update [fb52d7b...7cb58eb](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#issuecomment-449454398 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=h1) Report > Merging [#4353](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/fb52d7b7ca7f08ffdaaa31558eb864aed3fdc9da?src=pr=desc) will **decrease** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4353/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4353 +/- ## == - Coverage 78.15% 78.14% -0.02% == Files 204 202 -2 Lines 1649916486 -13 == - Hits1289512883 -12 + Misses 3604 3603 -1 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (-0.05%)` | :arrow_down: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.48% <0%> (-0.05%)` | :arrow_down: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `77.39% <0%> (-0.03%)` | :arrow_down: | | [airflow/models/connection.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `88.6% <0%> (ø)` | :arrow_up: | | [airflow/models/dagpickle.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncGlja2xlLnB5) | | | | [airflow/models/base.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZS5weQ==) | | | | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `32.53% <0%> (+0.25%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=footer). Last update [fb52d7b...7cb58eb](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
codecov-io commented on issue #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#issuecomment-450030796 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=h1) Report > Merging [#4354](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/fb52d7b7ca7f08ffdaaa31558eb864aed3fdc9da?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4354/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4354 +/- ## === Coverage 78.15% 78.15% === Files 204 204 Lines 1649916499 === Hits1289512895 Misses 3604 3604 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=footer). Last update [fb52d7b...8b7891c](https://codecov.io/gh/apache/incubator-airflow/pull/4354?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4368: [AIRFLOW-3561] Improve queries
codecov-io edited a comment on issue #4368: [AIRFLOW-3561] Improve queries URL: https://github.com/apache/incubator-airflow/pull/4368#issuecomment-449655224 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=h1) Report > Merging [#4368](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c014324be237523d23ceb7a6250218d28befddfd?src=pr=desc) will **increase** coverage by `0.02%`. > The diff coverage is `90.9%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4368/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4368 +/- ## == + Coverage 78.14% 78.17% +0.02% == Files 202 204 +2 Lines 1648616530 +44 == + Hits1288312922 +39 - Misses 3603 3608 +5 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.81% <100%> (+0.04%)` | :arrow_up: | | [airflow/www\_rbac/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy92aWV3cy5weQ==) | `72.59% <100%> (+0.16%)` | :arrow_up: | | [airflow/www/views.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy93d3cvdmlld3MucHk=) | `69.31% <80%> (-0.08%)` | :arrow_down: | | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `32.28% <0%> (-0.26%)` | :arrow_down: | | [airflow/models/connection.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `88.6% <0%> (ø)` | :arrow_up: | | [airflow/models/dagpickle.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncGlja2xlLnB5) | `94.11% <0%> (ø)` | | | [airflow/models/base.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZS5weQ==) | `85.71% <0%> (ø)` | | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `77.41% <0%> (+0.02%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.52% <0%> (+0.04%)` | :arrow_up: | | ... and [1 more](https://codecov.io/gh/apache/incubator-airflow/pull/4368/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=footer). Last update [c014324...4db22b1](https://codecov.io/gh/apache/incubator-airflow/pull/4368?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244048635 ## File path: setup.py ## @@ -189,6 +189,7 @@ def write_version(filename=os.path.join(*['airflow', 'google-auth>=1.0.0, <2.0.0dev', 'google-auth-httplib2>=0.0.1', 'google-cloud-container>=0.1.1', +'google-cloud-bigtable==0.31.0', Review comment: The problem is that those libraries are pretty unstable and relying on automated upgrade for future versions is pretty bad idea - last time when @DariuszAniszewski upgraded 0.30.0 (I think) -> 0.31.0 it actually broke the operator and some fixes had to be made (and we cannot really rely on future versions to not break anything). Other dependencies also made breaking changes quite recently when they automatically upgraded (infamous flask-appbuilder for example). There is a whole discussion on keeping fixed vs. upper-open version number on the devlist (I am involved but had no time to take a closer look at this yet). I think for now - until general approach is implemented, I think it's safer to keep it fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4367: [AIRFLOW-3551] Improve BashOperator Test Coverage
codecov-io edited a comment on issue #4367: [AIRFLOW-3551] Improve BashOperator Test Coverage URL: https://github.com/apache/incubator-airflow/pull/4367#issuecomment-449651441 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=h1) Report > Merging [#4367](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/7a6acbf5b343e4a6895d1cc8af75ecc02b4fd0e8?src=pr=desc) will **increase** coverage by `1.29%`. > The diff coverage is `100%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4367/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4367 +/- ## == + Coverage 78.12% 79.42% +1.29% == Files 202 204 +2 Lines 1648318586+2103 == + Hits1287814762+1884 - Misses 3605 3824 +219 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/operators/bash\_operator.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9vcGVyYXRvcnMvYmFzaF9vcGVyYXRvci5weQ==) | `92.98% <100%> (+1.6%)` | :arrow_up: | | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `28.47% <0%> (-4.07%)` | :arrow_down: | | [airflow/models/dagpickle.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvZGFncGlja2xlLnB5) | `94.11% <0%> (ø)` | | | [airflow/models/base.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvYmFzZS5weQ==) | `85.71% <0%> (ø)` | | | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `94.35% <0%> (+1.67%)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `79.11% <0%> (+1.72%)` | :arrow_up: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `84.35% <0%> (+2.13%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `66.72% <0%> (+2.24%)` | :arrow_up: | | [airflow/models/connection.py](https://codecov.io/gh/apache/incubator-airflow/pull/4367/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvY29ubmVjdGlvbi5weQ==) | `93.44% <0%> (+4.83%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=footer). Last update [7a6acbf...ba6109f](https://codecov.io/gh/apache/incubator-airflow/pull/4367?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244048057 ## File path: airflow/contrib/hooks/gcp_bigtable_hook.py ## @@ -0,0 +1,232 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from google.cloud.bigtable import Client +from google.cloud.bigtable.cluster import Cluster +from google.cloud.bigtable.instance import Instance +from google.cloud.bigtable.table import Table +from google.cloud.bigtable_admin_v2 import enums +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + + +# noinspection PyAbstractClass +class BigtableHook(GoogleCloudBaseHook): +""" +Hook for Google Cloud Bigtable APIs. +""" + +_client = None + +def __init__(self, + gcp_conn_id='google_cloud_default', + delegate_to=None): +super(BigtableHook, self).__init__(gcp_conn_id, delegate_to) + +def get_client(self, project_id): +if not self._client: +self._client = Client(project=project_id, credentials=self._get_credentials(), admin=True) +return self._client + +def get_instance(self, project_id, instance_id): Review comment: That's indeed a good idea in general for all GCP operators. We have not done it so far in neither of them - we've implemented around 30 operators so far and all of them have PROJECT_ID required. I think it's a good improvement proposal to all 30 operators - maybe we could implement it as a separate PR (because it has some consequences - in a number of places we validate that project_id is needed, we have unit tests testing it and some validation rules implemented. What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4298: [AIRFLOW-3478] Make sure that the session is closed
codecov-io edited a comment on issue #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#issuecomment-445580870 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=h1) Report > Merging [#4298](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/01880dcb3f19be8cff24eb2b23137e0aef8cdd6c?src=pr=desc) will **decrease** coverage by `0.05%`. > The diff coverage is `77.88%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4298/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4298 +/- ## == - Coverage 78.13% 78.08% -0.06% == Files 202 201 -1 Lines 1648716447 -40 == - Hits1288212842 -40 Misses 3605 3605 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `32.79% <ø> (+0.51%)` | :arrow_up: | | [airflow/api/common/experimental/mark\_tasks.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9tYXJrX3Rhc2tzLnB5) | `97.6% <100%> (-0.06%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.34% <100%> (ø)` | | | [airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5) | `88% <100%> (ø)` | :arrow_up: | | [airflow/utils/cli\_action\_loggers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9jbGlfYWN0aW9uX2xvZ2dlcnMucHk=) | `73.33% <100%> (-0.87%)` | :arrow_down: | | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `73.95% <100%> (-0.14%)` | :arrow_down: | | [airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `80.41% <100%> (ø)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `77.31% <100%> (-0.09%)` | :arrow_down: | | [airflow/www\_rbac/security.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9zZWN1cml0eS5weQ==) | `92.85% <100%> (+0.09%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.2% <68.11%> (-0.28%)` | :arrow_down: | | ... and [12 more](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=footer). Last update [01880dc...c7aed41](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4298: [AIRFLOW-3478] Make sure that the session is closed
codecov-io edited a comment on issue #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#issuecomment-445580870 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=h1) Report > Merging [#4298](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/01880dcb3f19be8cff24eb2b23137e0aef8cdd6c?src=pr=desc) will **decrease** coverage by `0.05%`. > The diff coverage is `77.88%`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4298/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4298 +/- ## == - Coverage 78.13% 78.08% -0.06% == Files 202 201 -1 Lines 1648716447 -40 == - Hits1288212842 -40 Misses 3605 3605 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/utils/db.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9kYi5weQ==) | `32.79% <ø> (+0.51%)` | :arrow_up: | | [airflow/api/common/experimental/mark\_tasks.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9tYXJrX3Rhc2tzLnB5) | `97.6% <100%> (-0.06%)` | :arrow_down: | | [airflow/models.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMucHk=) | `92.34% <100%> (ø)` | | | [airflow/api/common/experimental/delete\_dag.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9hcGkvY29tbW9uL2V4cGVyaW1lbnRhbC9kZWxldGVfZGFnLnB5) | `88% <100%> (ø)` | :arrow_up: | | [airflow/utils/cli\_action\_loggers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9jbGlfYWN0aW9uX2xvZ2dlcnMucHk=) | `73.33% <100%> (-0.87%)` | :arrow_down: | | [airflow/www\_rbac/utils.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy91dGlscy5weQ==) | `73.95% <100%> (-0.14%)` | :arrow_down: | | [airflow/settings.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9zZXR0aW5ncy5weQ==) | `80.41% <100%> (ø)` | :arrow_up: | | [airflow/jobs.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9qb2JzLnB5) | `77.31% <100%> (-0.09%)` | :arrow_down: | | [airflow/www\_rbac/security.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy93d3dfcmJhYy9zZWN1cml0eS5weQ==) | `92.85% <100%> (+0.09%)` | :arrow_up: | | [airflow/bin/cli.py](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree#diff-YWlyZmxvdy9iaW4vY2xpLnB5) | `64.2% <68.11%> (-0.28%)` | :arrow_down: | | ... and [12 more](https://codecov.io/gh/apache/incubator-airflow/pull/4298/diff?src=pr=tree-more) | | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=footer). Last update [01880dc...c7aed41](https://codecov.io/gh/apache/incubator-airflow/pull/4298?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
potiuk commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244047354 ## File path: airflow/contrib/hooks/gcp_bigtable_hook.py ## @@ -0,0 +1,232 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from google.cloud.bigtable import Client +from google.cloud.bigtable.cluster import Cluster +from google.cloud.bigtable.instance import Instance +from google.cloud.bigtable.table import Table +from google.cloud.bigtable_admin_v2 import enums +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + + +# noinspection PyAbstractClass +class BigtableHook(GoogleCloudBaseHook): Review comment: @DariuszAniszewski is on holidays, but let me add a comment on that - this is very similar story with Spanner hook, we are testing GCP hooks with System tests (and real GCP) rather than with mock unit tests. The logic in those hooks is really simple - usually simply calling the corresponding library method, so tests for each would be quite redundant - for example tests for update_cluster would test if Cluster constructor is called once and then if update() method is called once. That's still possible and easy to do (and we can still do it) but I don't think it adds a lot of value. Do you think it makes sense to have such tests @jmcarp ? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io commented on issue #4365: [AIRFLOW-1684] - Branching based on XCom variable (Docs)
codecov-io commented on issue #4365: [AIRFLOW-1684] - Branching based on XCom variable (Docs) URL: https://github.com/apache/incubator-airflow/pull/4365#issuecomment-450022623 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=h1) Report > Merging [#4365](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/0e365665957d09c1aa612014b40994e3634ef70e?src=pr=desc) will **increase** coverage by `0.04%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4365/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4365 +/- ## == + Coverage 78.12% 78.17% +0.04% == Files 202 202 Lines 1648316527 +44 == + Hits1287812920 +42 - Misses 3605 3607 +2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4365/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (+0.08%)` | :arrow_up: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4365/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `84.35% <0%> (+2.13%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=footer). Last update [0e36566...a405454](https://codecov.io/gh/apache/incubator-airflow/pull/4365?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #4348: [AIRFLOW-XXX] Add section to Updating.md regarding timezones
Fokko closed pull request #4348: [AIRFLOW-XXX] Add section to Updating.md regarding timezones URL: https://github.com/apache/incubator-airflow/pull/4348 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/UPDATING.md b/UPDATING.md index 575fc0a3c5..851c299db1 100644 --- a/UPDATING.md +++ b/UPDATING.md @@ -202,6 +202,7 @@ There are five roles created for Airflow by default: Admin, User, Op, Viewer, an - All ModelViews in Flask-AppBuilder follow a different pattern from Flask-Admin. The `/admin` part of the URL path will no longer exist. For example: `/admin/connection` becomes `/connection/list`, `/admin/connection/new` becomes `/connection/add`, `/admin/connection/edit` becomes `/connection/edit`, etc. - Due to security concerns, the new webserver will no longer support the features in the `Data Profiling` menu of old UI, including `Ad Hoc Query`, `Charts`, and `Known Events`. - HiveServer2Hook.get_results() always returns a list of tuples, even when a single column is queried, as per Python API 2. +- **UTC is now the default timezone**: Either reconfigure your workflows scheduling in UTC or set `default_timezone` as explained in https://airflow.apache.org/timezone.html#default-time-zone ### airflow.contrib.sensors.hdfs_sensors renamed to airflow.contrib.sensors.hdfs_sensor This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Assigned] (AIRFLOW-3459) Refactor: Move DagPickle out of models.py
[ https://issues.apache.org/jira/browse/AIRFLOW-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong reassigned AIRFLOW-3459: - Assignee: Bas Harenslak > Refactor: Move DagPickle out of models.py > - > > Key: AIRFLOW-3459 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3459 > Project: Apache Airflow > Issue Type: Task > Components: models >Affects Versions: 1.10.1 >Reporter: Fokko Driesprong >Assignee: Bas Harenslak >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Closed] (AIRFLOW-3459) Refactor: Move DagPickle out of models.py
[ https://issues.apache.org/jira/browse/AIRFLOW-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-3459. - Resolution: Fixed > Refactor: Move DagPickle out of models.py > - > > Key: AIRFLOW-3459 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3459 > Project: Apache Airflow > Issue Type: Task > Components: models >Affects Versions: 1.10.1 >Reporter: Fokko Driesprong >Assignee: Bas Harenslak >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#issuecomment-449454398 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=h1) Report > Merging [#4353](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/7a6acbf5b343e4a6895d1cc8af75ecc02b4fd0e8?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4353/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4353 +/- ## == + Coverage 78.12% 78.14% +0.01% == Files 202 202 Lines 1648316486 +3 == + Hits1287812883 +5 + Misses 3605 3603 -2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (+0.08%)` | :arrow_up: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `82.6% <0%> (+0.38%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=footer). Last update [7a6acbf...f2eb903](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
codecov-io edited a comment on issue #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#issuecomment-449454398 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=h1) Report > Merging [#4353](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/7a6acbf5b343e4a6895d1cc8af75ecc02b4fd0e8?src=pr=desc) will **increase** coverage by `0.01%`. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4353/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) ```diff @@Coverage Diff @@ ## master#4353 +/- ## == + Coverage 78.12% 78.14% +0.01% == Files 202 202 Lines 1648316486 +3 == + Hits1287812883 +5 + Misses 3605 3603 -2 ``` | [Impacted Files](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=tree) | Coverage Δ | | |---|---|---| | [airflow/models/\_\_init\_\_.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy9tb2RlbHMvX19pbml0X18ucHk=) | `92.76% <0%> (+0.08%)` | :arrow_up: | | [airflow/utils/helpers.py](https://codecov.io/gh/apache/incubator-airflow/pull/4353/diff?src=pr=tree#diff-YWlyZmxvdy91dGlscy9oZWxwZXJzLnB5) | `82.6% <0%> (+0.38%)` | :arrow_up: | -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=footer). Last update [7a6acbf...f2eb903](https://codecov.io/gh/apache/incubator-airflow/pull/4353?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko closed pull request #4374: [AIRFLOW-3459] Move DagPickle to separate file
Fokko closed pull request #4374: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4374 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/api/common/experimental/delete_dag.py b/airflow/api/common/experimental/delete_dag.py index e7d15772c9..d04355f940 100644 --- a/airflow/api/common/experimental/delete_dag.py +++ b/airflow/api/common/experimental/delete_dag.py @@ -47,7 +47,7 @@ def delete_dag(dag_id, keep_records_in_log=True): count = 0 # noinspection PyUnresolvedReferences,PyProtectedMember -for m in models.Base._decl_class_registry.values(): +for m in models.base.Base._decl_class_registry.values(): if hasattr(m, "dag_id"): if keep_records_in_log and m.__name__ == 'Log': continue diff --git a/airflow/bin/cli.py b/airflow/bin/cli.py index 143e2b34aa..c8840011e3 100644 --- a/airflow/bin/cli.py +++ b/airflow/bin/cli.py @@ -34,7 +34,6 @@ from builtins import input from collections import namedtuple -from airflow.models.connection import Connection from airflow.utils.timezone import parse as parsedate import json from tabulate import tabulate @@ -56,9 +55,9 @@ from airflow import configuration as conf from airflow.exceptions import AirflowException, AirflowWebServerTimeout from airflow.executors import GetDefaultExecutor -from airflow.models import (DagModel, DagBag, TaskInstance, -DagPickle, DagRun, Variable, DagStat, DAG) - +from airflow.models import DagModel, DagBag, TaskInstance, DagRun, Variable, DagStat, DAG +from airflow.models.connection import Connection +from airflow.models.dagpickle import DagPickle from airflow.ti_deps.dep_context import (DepContext, SCHEDULER_DEPS) from airflow.utils import cli as cli_utils from airflow.utils import db as db_utils diff --git a/airflow/jobs.py b/airflow/jobs.py index 8472ecd383..a82190fc9d 100644 --- a/airflow/jobs.py +++ b/airflow/jobs.py @@ -43,6 +43,7 @@ from airflow import executors, models, settings from airflow.exceptions import AirflowException from airflow.models import DAG, DagRun +from airflow.models.dagpickle import DagPickle from airflow.settings import Stats from airflow.task.task_runner import get_task_runner from airflow.ti_deps.dep_context import DepContext, QUEUE_DEPS, RUN_DEPS @@ -61,7 +62,7 @@ from airflow.utils.sqlalchemy import UtcDateTime from airflow.utils.state import State -Base = models.Base +Base = models.base.Base ID_LEN = models.ID_LEN @@ -2428,7 +2429,7 @@ def _execute(self, session=None): pickle_id = None if not self.donot_pickle and self.executor.__class__ not in ( executors.LocalExecutor, executors.SequentialExecutor): -pickle = models.DagPickle(self.dag) +pickle = DagPickle(self.dag) session.add(pickle) session.commit() pickle_id = pickle.id diff --git a/airflow/migrations/env.py b/airflow/migrations/env.py index 4e1977635a..76c2eb329b 100644 --- a/airflow/migrations/env.py +++ b/airflow/migrations/env.py @@ -36,7 +36,7 @@ # for 'autogenerate' support # from myapp import mymodel # target_metadata = mymodel.Base.metadata -target_metadata = models.Base.metadata +target_metadata = models.base.Base.metadata # other values from the config, defined by the needs of env.py, # can be acquired: diff --git a/airflow/models/__init__.py b/airflow/models/__init__.py index aa93f5cb88..fcd23fe1fb 100755 --- a/airflow/models/__init__.py +++ b/airflow/models/__init__.py @@ -28,6 +28,8 @@ from builtins import ImportError as BuiltinImportError, bytes, object, str from future.standard_library import install_aliases +from airflow.models.base import Base + try: # Fix Python > 3.7 deprecation from collections.abc import Hashable @@ -64,10 +66,10 @@ from sqlalchemy import ( Boolean, Column, DateTime, Float, ForeignKey, ForeignKeyConstraint, Index, -Integer, LargeBinary, PickleType, String, Text, UniqueConstraint, MetaData, -and_, asc, func, or_, true as sqltrue +Integer, LargeBinary, PickleType, String, Text, UniqueConstraint, and_, asc, +func, or_, true as sqltrue ) -from sqlalchemy.ext.declarative import declarative_base, declared_attr +from sqlalchemy.ext.declarative import declared_attr from sqlalchemy.orm import reconstructor, relationship, synonym from croniter import ( @@ -84,6 +86,7 @@ ) from airflow.dag.base_dag import BaseDag, BaseDagBag from airflow.lineage import apply_lineage, prepare_lineage +from airflow.models.dagpickle import DagPickle from airflow.ti_deps.deps.not_in_retry_period_dep import NotInRetryPeriodDep from airflow.ti_deps.deps.prev_dagrun_dep import PrevDagrunDep
[GitHub] Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450018959 I didn't know about this restriction, happy to change it :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed
Fokko commented on issue #4298: [AIRFLOW-3478] Make sure that the session is closed URL: https://github.com/apache/incubator-airflow/pull/4298#issuecomment-450018792 Rebased :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
BasPH commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450018524 Spotted a mistake in my code, closed the issue to fix it first, then discovered I couldn't re-open because I done a force push... when will the Airflow project change its contributing policy? ;) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Closed] (AIRFLOW-2568) Implement a Azure Container Instances operator
[ https://issues.apache.org/jira/browse/AIRFLOW-2568?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-2568. - Resolution: Fixed Fix Version/s: 2.0.0 > Implement a Azure Container Instances operator > -- > > Key: AIRFLOW-2568 > URL: https://issues.apache.org/jira/browse/AIRFLOW-2568 > Project: Apache Airflow > Issue Type: New Feature >Reporter: Niels Zeilemaker >Assignee: Niels Zeilemaker >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #4121: [AIRFLOW-2568] Azure Container Instances operator
Fokko closed pull request #4121: [AIRFLOW-2568] Azure Container Instances operator URL: https://github.com/apache/incubator-airflow/pull/4121 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/example_dags/example_azure_container_instances_operator.py b/airflow/contrib/example_dags/example_azure_container_instances_operator.py new file mode 100644 index 00..181a30b50e --- /dev/null +++ b/airflow/contrib/example_dags/example_azure_container_instances_operator.py @@ -0,0 +1,54 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from airflow import DAG +from airflow.contrib.operators.azure_container_instances_operator import AzureContainerInstancesOperator +from datetime import datetime, timedelta + +default_args = { +'owner': 'airflow', +'depends_on_past': False, +'start_date': datetime(2018, 11, 1), +'email': ['airf...@example.com'], +'email_on_failure': False, +'email_on_retry': False, +'retries': 1, +'retry_delay': timedelta(minutes=5), +} + +dag = DAG( +'aci_example', +default_args=default_args, +schedule_interval=timedelta(1) +) + +t1 = AzureContainerInstancesOperator( +ci_conn_id='azure_container_instances_default', +registry_conn_id=None, +resource_group='resource-group', +name='aci-test-{{ ds }}', +image='hello-world', +region='WestUS2', +environment_variables={}, +volumes=[], +memory_in_gb=4.0, +cpu=1.0, +task_id='start_container', +dag=dag +) diff --git a/airflow/contrib/hooks/azure_container_instance_hook.py b/airflow/contrib/hooks/azure_container_instance_hook.py new file mode 100644 index 00..5ad64de6d7 --- /dev/null +++ b/airflow/contrib/hooks/azure_container_instance_hook.py @@ -0,0 +1,167 @@ +# -*- coding: utf-8 -*- +# +# Licensed under the Apache License, Version 2.0 (the "License"); +# you may not use this file except in compliance with the License. +# You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, software +# distributed under the License is distributed on an "AS IS" BASIS, +# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. +# See the License for the specific language governing permissions and +# limitations under the License. +# + +import os + +from airflow.hooks.base_hook import BaseHook +from airflow.exceptions import AirflowException + +from azure.common.client_factory import get_client_from_auth_file +from azure.common.credentials import ServicePrincipalCredentials + +from azure.mgmt.containerinstance import ContainerInstanceManagementClient + + +class AzureContainerInstanceHook(BaseHook): +""" +A hook to communicate with Azure Container Instances. + +This hook requires a service principal in order to work. +After creating this service principal +(Azure Active Directory/App Registrations), you need to fill in the +client_id (Application ID) as login, the generated password as password, +and tenantId and subscriptionId in the extra's field as a json. + +:param conn_id: connection id of a service principal which will be used +to start the container instance +:type conn_id: str +""" + +def __init__(self, conn_id='azure_default'): +self.conn_id = conn_id +self.connection = self.get_conn() + +def get_conn(self): +conn = self.get_connection(self.conn_id) +key_path = conn.extra_dejson.get('key_path', False) +if key_path: +if key_path.endswith('.json'): +self.log.info('Getting connection using a JSON key file.') +return get_client_from_auth_file(ContainerInstanceManagementClient, + key_path) +else: +raise AirflowException('Unrecognised extension for key file.') + +if os.environ.get('AZURE_AUTH_LOCATION'): +
[jira] [Resolved] (AIRFLOW-1413) FTPSensor fails when 550 error message text differs from the expected
[ https://issues.apache.org/jira/browse/AIRFLOW-1413?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong resolved AIRFLOW-1413. --- Resolution: Fixed Fix Version/s: 2.0.0 > FTPSensor fails when 550 error message text differs from the expected > - > > Key: AIRFLOW-1413 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1413 > Project: Apache Airflow > Issue Type: Bug >Reporter: Michal Dziemianko >Assignee: Michal Dziemianko >Priority: Minor > Fix For: 2.0.0 > > Original Estimate: 24h > Remaining Estimate: 24h > > The FTPSensor relies on an error message returned by the FTPHook. The > expected message is "Can't check for file existence" and is used to check > for errors in following way: > {code:none} > try: > hook.get_mod_time(self.path) > except ftplib.error_perm as e: > error = str(e).split(None, 1) > if error[1] != "Can't check for file existence": > raise e > {code} > However on my system (Linux/Antregos) this message text is different, leading > to inevitable dag termination. Moreover the actual format of the exception > on my system is different and the split returns '-' instead of the actual > message. > Testing for the error message text is not a good idea - this text is not > consistent across platforms and locales (I tried changing locale and I can > get it to return me localised messages). > Instead, as a quick fix, I suggest testing for the error code in the > following way: > {code:none} > error = str(e).split(None, 1) > if error[0] != "550": > raise e > {code} > This is more reliable, although still not perfect as per FTP specification > the codes are just indicatory rather then mandatory. Moreover certain codes > (4xx series) are transient and arguably should not cause an exception if > recovery is possible (as an example host unavailable can be a temporary > network issue) within time limit. > I am going to provide PR with series of improvements. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #2450: [Airflow-1413] Fix FTPSensor failing on error message with unexpected text.
Fokko closed pull request #2450: [Airflow-1413] Fix FTPSensor failing on error message with unexpected text. URL: https://github.com/apache/incubator-airflow/pull/2450 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/airflow/contrib/sensors/ftp_sensor.py b/airflow/contrib/sensors/ftp_sensor.py index efdedd7a62..69900d5205 100644 --- a/airflow/contrib/sensors/ftp_sensor.py +++ b/airflow/contrib/sensors/ftp_sensor.py @@ -17,6 +17,7 @@ # specific language governing permissions and limitations # under the License. import ftplib +import re from airflow.contrib.hooks.ftp_hook import FTPHook, FTPSHook from airflow.sensors.base_sensor_operator import BaseSensorOperator @@ -26,33 +27,65 @@ class FTPSensor(BaseSensorOperator): """ Waits for a file or directory to be present on FTP. - -:param path: Remote file or directory path -:type path: str -:param ftp_conn_id: The connection to run the sensor against -:type ftp_conn_id: str """ + template_fields = ('path',) +"""Errors that are transient in nature, and where action can be retried""" +transient_errors = [421, 425, 426, 434, 450, 451, 452] + +error_code_pattern = re.compile("([\d]+)") + @apply_defaults -def __init__(self, path, ftp_conn_id='ftp_default', *args, **kwargs): +def __init__( +self, +path, +ftp_conn_id='ftp_default', +fail_on_transient_errors=True, +*args, +**kwargs): +""" +Create a new FTP sensor + +:param path: Remote file or directory path +:type path: str +:param fail_on_transient_errors: Fail on all errors, +including 4xx transient errors. Default True. +:type fail_on_transient_errors: bool +:param ftp_conn_id: The connection to run the sensor against +:type ftp_conn_id: str +""" + super(FTPSensor, self).__init__(*args, **kwargs) self.path = path self.ftp_conn_id = ftp_conn_id +self.fail_on_transient_errors = fail_on_transient_errors def _create_hook(self): """Return connection hook.""" return FTPHook(ftp_conn_id=self.ftp_conn_id) +def _get_error_code(self, e): +"""Extract error code from ftp exception""" +try: +matches = self.error_code_pattern.match(str(e)) +code = int(matches.group(0)) +return code +except ValueError: +return e + def poke(self, context): with self._create_hook() as hook: self.log.info('Poking for %s', self.path) try: hook.get_mod_time(self.path) except ftplib.error_perm as e: -error = str(e).split(None, 1) -if error[1] != "Can't check for file existence": +self.log.info('Ftp error encountered: %s', str(e)) +error_code = self._get_error_code(e) +if ((error_code != 550) and +(self.fail_on_transient_errors or +(error_code not in self.transient_errors))): raise e return False diff --git a/tests/contrib/sensors/test_ftp_sensor.py b/tests/contrib/sensors/test_ftp_sensor.py index cf0fdb4918..8996dc08e0 100644 --- a/tests/contrib/sensors/test_ftp_sensor.py +++ b/tests/contrib/sensors/test_ftp_sensor.py @@ -49,8 +49,13 @@ def test_poke(self): task_id="test_task") self.hook_mock.get_mod_time.side_effect = \ -[error_perm("550: Can't check for file existence"), None] +[error_perm("550: Can't check for file existence"), +error_perm("550: Directory or file does not exist"), +error_perm("550 - Directory or file does not exist"), +None] +self.assertFalse(op.poke(None)) +self.assertFalse(op.poke(None)) self.assertFalse(op.poke(None)) self.assertTrue(op.poke(None)) @@ -66,6 +71,28 @@ def test_poke_fails_due_error(self): self.assertTrue("530" in str(context.exception)) +def test_poke_fail_on_transient_error(self): +op = FTPSensor(path="foobar.json", ftp_conn_id="bob_ftp", + task_id="test_task") + +self.hook_mock.get_mod_time.side_effect = \ +error_perm("434: Host unavailable") + +with self.assertRaises(error_perm) as context: +op.execute(None) + +self.assertTrue("434" in str(context.exception)) + +def test_poke_ignore_transient_error(self): +op = FTPSensor(path="foobar.json", ftp_conn_id="bob_ftp", + task_id="test_task",
[jira] [Closed] (AIRFLOW-1684) Branching based on XCOM variable
[ https://issues.apache.org/jira/browse/AIRFLOW-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Fokko Driesprong closed AIRFLOW-1684. - Resolution: Fixed Fix Version/s: 2.0.0 > Branching based on XCOM variable > > > Key: AIRFLOW-1684 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1684 > Project: Apache Airflow > Issue Type: New Feature > Components: xcom >Affects Versions: 1.7.0 > Environment: Centos 7, Airflow1.7 >Reporter: Virendhar Sivaraman >Assignee: Elad >Priority: Major > Fix For: 2.0.0 > > > I would like to branch my dag based on a XCOM variable. > Steps: > 1. Populate XCOM in bash > 2. pull the XCOM variable in a BranchPythonOperator and branch it out based > on the XCOM variable > I've tried the documentation and researched on the internet - haven't been > successful. > This feature will be helpful if its not available yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko closed pull request #4365: [AIRFLOW-1684] - Branching based on XCom variable (Docs)
Fokko closed pull request #4365: [AIRFLOW-1684] - Branching based on XCom variable (Docs) URL: https://github.com/apache/incubator-airflow/pull/4365 This is a PR merged from a forked repository. As GitHub hides the original diff on merge, it is displayed below for the sake of provenance: As this is a foreign pull request (from a fork), the diff is supplied below (as it won't show otherwise due to GitHub magic): diff --git a/docs/concepts.rst b/docs/concepts.rst index eac7a8a7f1..766b69b783 100644 --- a/docs/concepts.rst +++ b/docs/concepts.rst @@ -522,6 +522,37 @@ Not like this, where the join task is skipped .. image:: img/branch_bad.png +The ``BranchPythonOperator`` can also be used with XComs allowing branching +context to dynamically decide what branch to follow based on previous tasks. +For example: + +.. code:: python + + def branch_func(**kwargs): + ti = kwargs['ti'] + xcom_value = int(ti.xcom_pull(task_ids='start_task')) + if xcom_value >= 5: + return 'continue_task' + else: + return 'stop_task' + + start_op = BashOperator( + task_id='start_task', + bash_command="echo 5", + xcom_push=True, + dag=dag) + + branch_op = BranchPythonOperator( + task_id='branch_task', + provide_context=True, + python_callable=branch_func, + dag=dag) + + continue_op = DummyOperator(task_id='continue_task', dag=dag) + stop_op = DummyOperator(task_id='stop_task', dag=dag) + + start_op >> branch_op >> [continue_op, stop_op] + SubDAGs === This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-1684) Branching based on XCOM variable
[ https://issues.apache.org/jira/browse/AIRFLOW-1684?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729168#comment-16729168 ] ASF GitHub Bot commented on AIRFLOW-1684: - Fokko commented on pull request #4365: [AIRFLOW-1684] - Branching based on XCom variable (Docs) URL: https://github.com/apache/incubator-airflow/pull/4365 This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Branching based on XCOM variable > > > Key: AIRFLOW-1684 > URL: https://issues.apache.org/jira/browse/AIRFLOW-1684 > Project: Apache Airflow > Issue Type: New Feature > Components: xcom >Affects Versions: 1.7.0 > Environment: Centos 7, Airflow1.7 >Reporter: Virendhar Sivaraman >Assignee: Elad >Priority: Major > > I would like to branch my dag based on a XCOM variable. > Steps: > 1. Populate XCOM in bash > 2. pull the XCOM variable in a BranchPythonOperator and branch it out based > on the XCOM variable > I've tried the documentation and researched on the internet - haven't been > successful. > This feature will be helpful if its not available yet. -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] Fokko commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync
Fokko commented on issue #3770: [AIRFLOW-3281] Fix Kubernetes operator with git-sync URL: https://github.com/apache/incubator-airflow/pull/3770#issuecomment-450016715 @dimberman final thoughts? I'm not into the k8s part right now. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4339: [AIRFLOW-3303] Deprecate old UI in favor of FAB
Fokko commented on issue #4339: [AIRFLOW-3303] Deprecate old UI in favor of FAB URL: https://github.com/apache/incubator-airflow/pull/4339#issuecomment-450016352 @verdan Can you rebase? Would be good to get this merged, since we're now doing duplicate work :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on a change in pull request #4368: [AIRFLOW-3561] Improve queries
Fokko commented on a change in pull request #4368: [AIRFLOW-3561] Improve queries URL: https://github.com/apache/incubator-airflow/pull/4368#discussion_r244041406 ## File path: airflow/models/__init__.py ## @@ -3249,6 +3259,17 @@ def __exit__(self, _type, _value, _tb): # /Context Manager -- +@property +def default_view(self): Review comment: Awesome, thanks! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file
Fokko commented on issue #4370: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4370#issuecomment-450015414 @BasPH Why did you close this? :-) This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244038642 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -168,39 +168,171 @@ def delete_instance(self, project_id, instance_id): """ Deletes an existing Cloud Spanner instance. -:param project_id: The ID of the project which owns the instances, tables and data. +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. :type project_id: str -:param instance_id: The ID of the instance. +:param instance_id: The ID of the Cloud Spanner instance. :type instance_id: str """ -client = self.get_client(project_id) -instance = client.instance(instance_id) +instance = self.get_client(project_id).instance(instance_id) try: instance.delete() return True except GoogleAPICallError as e: -self.log.error('An error occurred: %s. Aborting.', e.message) +self.log.error('An error occurred: %s. Exiting.', e.message) +raise e + +def get_database(self, project_id, instance_id, database_id): +# type: (str, str, str) -> Optional[Database] +""" +Retrieves a database in Cloud Spanner. If the database does not exist +in the specified instance, it returns None. + +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. +:type project_id: str +:param instance_id: The ID of the Cloud Spanner instance. +:type instance_id: str +:param database_id: The ID of the database in Cloud Spanner. +:type database_id: str +:return: Review comment: Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244038274 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -147,15 +149,13 @@ def _apply_to_instance(self, project_id, instance_id, configuration_name, node_c :param func: Method of the instance to be called. :type func: Callable """ -client = self.get_client(project_id) -instance = client.instance(instance_id, - configuration_name=configuration_name, - node_count=node_count, - display_name=display_name) +instance = self.get_client(project_id).instance( Review comment: I'd rather wait with hook tests to subsequent PRs (and maybe we solve it in a bit different way). I think it is rather difficult to test automatically the hooks to GCP using mocking/classic unit tests of Airflow. Please let me know what you think about the approach we use? Maybe we are missing something and we can add some unit test for hooks which could add value :)? We chose a slightly different strategy for testing GCP operators (and hooks): * First of all we try to make hooks as straightforward as possible - basically 1-1 operation of an existing library method but we make it synchronous - waiting for operation to complete. Then we put pretty much all logic into operators. * What we realised is that we can mostly test with mocking and unit tests in this case, is whether this particular library method has been called. Which is a bit redundant - that's why we do not have those hook tests. * instead we run automated system tests ("we" means the team from my company Polidea - 3 people who work on those operators). We use example DAGs (`example_gcp_spanner.py` in this case) to run them automatically with a real GCP_PROJECT. We even have a way to run them using (skipped by default) unit testss (see for example CloudSpannerExampleDagsTest). We have a separate and quite sophisticated environment for that - we run it automatically in GCP Cloud Build on every push and this way we test e-2-e whether the operators (and thus hooks) work with real GCP. I am going to try to contribute it very soon to the community (it's already open-sourced and I am polishing it) so that others will be able to do the same with their own GCP projects. You can see it here https://github.com/PolideaInternal/airflow-breeze and I will be happy to involve you for comments/re view when we are ready to share it (I think in a couple of days). What do you think? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Updated] (AIRFLOW-3570) Scheduler stalled when statsd is enabled
[ https://issues.apache.org/jira/browse/AIRFLOW-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinh Nguyen updated AIRFLOW-3570: -- Description: Web servers and workers appear to work fine with statsd_on = True and we can see metrics from them. But scheduler stuck, scheduler python processes got into zoombie mode. Unfortunately, we don't have root access (it's inside Docker container) so we can't profile the processes easily. Anyone else seeing similar issue with scheduler and statsd integration? was: Web servers and workers appear to work fine with statsd_on = True and we can see metrics from them. But scheduler stuck, scheduler python processes got into zoombie mode and unfortunately since we don't have root access (it's inside Docker container) so we can't profile the processes easily. Anyone else seeing similar issue with scheduler and statsd integration? > Scheduler stalled when statsd is enabled > > > Key: AIRFLOW-3570 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3570 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.1 >Reporter: Chinh Nguyen >Priority: Minor > > Web servers and workers appear to work fine with statsd_on = True and we can > see metrics from them. > But scheduler stuck, scheduler python processes got into zoombie mode. > Unfortunately, we don't have root access (it's inside Docker container) so we > can't profile the processes easily. Anyone else seeing similar issue with > scheduler and statsd integration? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Created] (AIRFLOW-3570) Scheduler stalled when statsd is enabled
Chinh Nguyen created AIRFLOW-3570: - Summary: Scheduler stalled when statsd is enabled Key: AIRFLOW-3570 URL: https://issues.apache.org/jira/browse/AIRFLOW-3570 Project: Apache Airflow Issue Type: Bug Components: scheduler Affects Versions: 1.10.1 Reporter: Chinh Nguyen Web servers and workers appear to work fine with statsd_on = True and we can see metrics from them. But scheduler seems to stuck, scheduler python processes got into zoombie mode and unfortunately since we don't have root access (it's inside Docker container) so we can't profile the processes easily. Anyone else seeing similar issue with scheduler and statsd integration? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[jira] [Updated] (AIRFLOW-3570) Scheduler stalled when statsd is enabled
[ https://issues.apache.org/jira/browse/AIRFLOW-3570?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Chinh Nguyen updated AIRFLOW-3570: -- Description: Web servers and workers appear to work fine with statsd_on = True and we can see metrics from them. But scheduler stuck, scheduler python processes got into zoombie mode and unfortunately since we don't have root access (it's inside Docker container) so we can't profile the processes easily. Anyone else seeing similar issue with scheduler and statsd integration? was: Web servers and workers appear to work fine with statsd_on = True and we can see metrics from them. But scheduler seems to stuck, scheduler python processes got into zoombie mode and unfortunately since we don't have root access (it's inside Docker container) so we can't profile the processes easily. Anyone else seeing similar issue with scheduler and statsd integration? > Scheduler stalled when statsd is enabled > > > Key: AIRFLOW-3570 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3570 > Project: Apache Airflow > Issue Type: Bug > Components: scheduler >Affects Versions: 1.10.1 >Reporter: Chinh Nguyen >Priority: Minor > > Web servers and workers appear to work fine with statsd_on = True and we can > see metrics from them. > But scheduler stuck, scheduler python processes got into zoombie mode and > unfortunately since we don't have root access (it's inside Docker container) > so we can't profile the processes easily. Anyone else seeing similar issue > with scheduler and statsd integration? -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
potiuk commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244035912 ## File path: setup.py ## @@ -245,6 +245,7 @@ def write_version(filename=os.path.join(*['airflow', devel = [ 'click==6.7', 'freezegun', +'freezegun', Review comment: indeed. Fixed. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244021232 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -168,39 +168,171 @@ def delete_instance(self, project_id, instance_id): """ Deletes an existing Cloud Spanner instance. -:param project_id: The ID of the project which owns the instances, tables and data. +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. :type project_id: str -:param instance_id: The ID of the instance. +:param instance_id: The ID of the Cloud Spanner instance. :type instance_id: str """ -client = self.get_client(project_id) -instance = client.instance(instance_id) +instance = self.get_client(project_id).instance(instance_id) try: instance.delete() return True except GoogleAPICallError as e: -self.log.error('An error occurred: %s. Aborting.', e.message) +self.log.error('An error occurred: %s. Exiting.', e.message) +raise e + +def get_database(self, project_id, instance_id, database_id): +# type: (str, str, str) -> Optional[Database] +""" +Retrieves a database in Cloud Spanner. If the database does not exist +in the specified instance, it returns None. + +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. +:type project_id: str +:param instance_id: The ID of the Cloud Spanner instance. +:type instance_id: str +:param database_id: The ID of the database in Cloud Spanner. +:type database_id: str +:return: +""" + +instance = self.get_client(project_id=project_id).instance( +instance_id=instance_id) +if not instance.exists(): +raise AirflowException("The instance {} does not exist in project {} !". + format(instance_id, project_id)) +database = instance.database(database_id=database_id) +if not database.exists(): +return None +else: +return database + +def create_database(self, project_id, instance_id, database_id, ddl_statements): +# type: (str, str, str, [str]) -> bool +""" +Creates a new database in Cloud Spanner. + +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. +:type project_id: str +:param instance_id: The ID of the Cloud Spanner instance. +:type instance_id: str +:param database_id: The ID of the database to create in Cloud Spanner. +:type database_id: str +:param ddl_statements: The string list containing DDL for the new database. +:type ddl_statements: [str] +:return: +""" + +instance = self.get_client(project_id=project_id).instance( +instance_id=instance_id) +if not instance.exists(): +raise AirflowException("The instance {} does not exist in project {} !". + format(instance_id, project_id)) +database = instance.database(database_id=database_id, + ddl_statements=ddl_statements) +try: +operation = database.create() # type: Operation +except GoogleAPICallError as e: +self.log.error('An error occurred: %s. Exiting.', e.message) +raise e + +if operation: +result = operation.result() +self.log.info(result) +return True Review comment: Is this return value useful? It looks like this method either returns `True` or raises an exception, so I'm not sure when it would be useful to check the return value. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244021125 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -168,39 +168,171 @@ def delete_instance(self, project_id, instance_id): """ Deletes an existing Cloud Spanner instance. -:param project_id: The ID of the project which owns the instances, tables and data. +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. :type project_id: str -:param instance_id: The ID of the instance. +:param instance_id: The ID of the Cloud Spanner instance. :type instance_id: str """ -client = self.get_client(project_id) -instance = client.instance(instance_id) +instance = self.get_client(project_id).instance(instance_id) try: instance.delete() return True except GoogleAPICallError as e: -self.log.error('An error occurred: %s. Aborting.', e.message) +self.log.error('An error occurred: %s. Exiting.', e.message) +raise e + +def get_database(self, project_id, instance_id, database_id): +# type: (str, str, str) -> Optional[Database] +""" +Retrieves a database in Cloud Spanner. If the database does not exist +in the specified instance, it returns None. + +:param project_id: The ID of the GCP project that owns the Cloud Spanner database. +:type project_id: str +:param instance_id: The ID of the Cloud Spanner instance. +:type instance_id: str +:param database_id: The ID of the database in Cloud Spanner. +:type database_id: str +:return: Review comment: Can you fill out the empty `:return:` values? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244021035 ## File path: airflow/contrib/hooks/gcp_spanner_hook.py ## @@ -147,15 +149,13 @@ def _apply_to_instance(self, project_id, instance_id, configuration_name, node_c :param func: Method of the instance to be called. :type func: Callable """ -client = self.get_client(project_id) -instance = client.instance(instance_id, - configuration_name=configuration_name, - node_count=node_count, - display_name=display_name) +instance = self.get_client(project_id).instance( Review comment: I was going to ask for tests of changes to the hook, but I see that we don't test this hook at all at the moment. Do you have time to add tests in this PR, or should we add them in a separate PR? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
jmcarp commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244018595 ## File path: setup.py ## @@ -189,6 +189,7 @@ def write_version(filename=os.path.join(*['airflow', 'google-auth>=1.0.0, <2.0.0dev', 'google-auth-httplib2>=0.0.1', 'google-cloud-container>=0.1.1', +'google-cloud-bigtable==0.31.0', Review comment: Can we use a version range instead of pinning to an exact version? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators
jmcarp commented on a change in pull request #4353: [AIRFLOW-3480] Add Database Deploy/Update/Delete operators URL: https://github.com/apache/incubator-airflow/pull/4353#discussion_r244018499 ## File path: setup.py ## @@ -245,6 +245,7 @@ def write_version(filename=os.path.join(*['airflow', devel = [ 'click==6.7', 'freezegun', +'freezegun', Review comment: Looks like a typo. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors URL: https://github.com/apache/incubator-airflow/pull/4363#issuecomment-449730551 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=h1) Report > Merging [#4363](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c014324be237523d23ceb7a6250218d28befddfd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4363/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4363 +/- ## === Coverage 78.14% 78.14% === Files 202 202 Lines 1648616486 === Hits1288312883 Misses 3603 3603 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=footer). Last update [c014324...ecb62bf](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors URL: https://github.com/apache/incubator-airflow/pull/4363#issuecomment-449730551 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=h1) Report > Merging [#4363](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c014324be237523d23ceb7a6250218d28befddfd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4363/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4363 +/- ## === Coverage 78.14% 78.14% === Files 202 202 Lines 1648616486 === Hits1288312883 Misses 3603 3603 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=footer). Last update [c014324...ecb62bf](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
codecov-io edited a comment on issue #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors URL: https://github.com/apache/incubator-airflow/pull/4363#issuecomment-449730551 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=h1) Report > Merging [#4363](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c014324be237523d23ceb7a6250218d28befddfd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4363/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4363 +/- ## === Coverage 78.14% 78.14% === Files 202 202 Lines 1648616486 === Hits1288312883 Misses 3603 3603 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=footer). Last update [c014324...ecb62bf](https://codecov.io/gh/apache/incubator-airflow/pull/4363?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[jira] [Commented] (AIRFLOW-3459) Refactor: Move DagPickle out of models.py
[ https://issues.apache.org/jira/browse/AIRFLOW-3459?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel=16729084#comment-16729084 ] ASF GitHub Bot commented on AIRFLOW-3459: - BasPH commented on pull request #4374: [AIRFLOW-3459] Move DagPickle to separate file URL: https://github.com/apache/incubator-airflow/pull/4374 Make sure you have checked _all_ steps below. ### Jira - [x] My PR addresses the following [Airflow Jira](https://issues.apache.org/jira/browse/AIRFLOW/) issues and references them in the PR title. For example, "\[AIRFLOW-XXX\] My Airflow PR" - https://issues.apache.org/jira/browse/AIRFLOW-3459 - In case you are fixing a typo in the documentation you can prepend your commit with \[AIRFLOW-XXX\], code changes always need a Jira issue. ### Description - [x] Here are some details about my PR, including screenshots of any UI changes: Move the DagPickle class to a separate file as part of [the big models.py refactor](https://issues.apache.org/jira/issues/?jql=text%20~%20%22Refactor%3A%20Move%20out%20of%20models.py%22%20AND%20reporter%20in%20(Fokko)) (/airflow/models/dagpickle.py). As a result of the refactoring, I also split off models.Base into a separate file. ### Tests - [x] My PR adds the following unit tests __OR__ does not need testing for this extremely good reason: Only moved code around, nothing new. ### Commits - [x] My commits all reference Jira issues in their subject lines, and I have squashed multiple commits if they address the same issue. In addition, my commits follow the guidelines from "[How to write a good git commit message](http://chris.beams.io/posts/git-commit/)": 1. Subject is separated from body by a blank line 1. Subject is limited to 50 characters (not including Jira issue reference) 1. Subject does not end with a period 1. Subject uses the imperative mood ("add", not "adding") 1. Body wraps at 72 characters 1. Body explains "what" and "why", not "how" ### Documentation - [x] In case of new functionality, my PR adds documentation that describes how to use it. - When adding new operators/hooks/sensors, the autoclass documentation generation needs to be added. - All the public functions and the classes in the PR contain docstrings that explain what it does ### Code Quality - [x] Passes `flake8` This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org > Refactor: Move DagPickle out of models.py > - > > Key: AIRFLOW-3459 > URL: https://issues.apache.org/jira/browse/AIRFLOW-3459 > Project: Apache Airflow > Issue Type: Task > Components: models >Affects Versions: 1.10.1 >Reporter: Fokko Driesprong >Priority: Major > Fix For: 2.0.0 > > -- This message was sent by Atlassian JIRA (v7.6.3#76005)
[GitHub] jmcarp commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators
jmcarp commented on a change in pull request #4354: [AIRFLOW-3446] Add Google Cloud BigTable operators URL: https://github.com/apache/incubator-airflow/pull/4354#discussion_r244013373 ## File path: airflow/contrib/hooks/gcp_bigtable_hook.py ## @@ -0,0 +1,232 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +from google.cloud.bigtable import Client +from google.cloud.bigtable.cluster import Cluster +from google.cloud.bigtable.instance import Instance +from google.cloud.bigtable.table import Table +from google.cloud.bigtable_admin_v2 import enums +from airflow.contrib.hooks.gcp_api_base_hook import GoogleCloudBaseHook + + +# noinspection PyAbstractClass +class BigtableHook(GoogleCloudBaseHook): +""" +Hook for Google Cloud Bigtable APIs. +""" + +_client = None + +def __init__(self, + gcp_conn_id='google_cloud_default', + delegate_to=None): +super(BigtableHook, self).__init__(gcp_conn_id, delegate_to) + +def get_client(self, project_id): +if not self._client: +self._client = Client(project=project_id, credentials=self._get_credentials(), admin=True) +return self._client + +def get_instance(self, project_id, instance_id): Review comment: Since this hook subclasses `GoogleCloudBaseHook`, you can get the project id optionally set on the connection with `self.project_id`. Can you make `project_id` optional and default to the connection field, like in `GoogleCloudStorageHook.create_bucket`? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
kaxil commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r244010554 ## File path: airflow/contrib/sensors/weekday_sensor.py ## @@ -0,0 +1,87 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import calendar +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils import timezone +from airflow.utils.decorators import apply_defaults + + +class DayOfWeekSensor(BaseSensorOperator): +""" +Waits until the first specified day of the week. For example, if the execution +day of the task is '2018-12-22' (Saturday) and you pass '5', the task will wait +until next Friday (Week Day: 5) + +:param week_day_number: Day of the week as an integer, where Monday is 1 and +Sunday is 7 (ISO Week Numbering) +:type week_day_number: int +:param use_task_execution_day: If ``True``, uses task's execution day to compare +with week_day_number. Execution Date is Useful for backfilling. +If ``False``, uses system's day of the week. Useful when you +don't want to run anything on weekdays on the system. +:type use_task_execution_day: bool +""" + +@apply_defaults +def __init__(self, week_day_number, Review comment: Updated This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] kaxil commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors
kaxil commented on a change in pull request #4363: [AIRFLOW-3560] Add WeekEnd & DayOfWeek Sensors URL: https://github.com/apache/incubator-airflow/pull/4363#discussion_r243996293 ## File path: airflow/contrib/sensors/weekday_sensor.py ## @@ -0,0 +1,87 @@ +# -*- coding: utf-8 -*- +# +# Licensed to the Apache Software Foundation (ASF) under one +# or more contributor license agreements. See the NOTICE file +# distributed with this work for additional information +# regarding copyright ownership. The ASF licenses this file +# to you under the Apache License, Version 2.0 (the +# "License"); you may not use this file except in compliance +# with the License. You may obtain a copy of the License at +# +# http://www.apache.org/licenses/LICENSE-2.0 +# +# Unless required by applicable law or agreed to in writing, +# software distributed under the License is distributed on an +# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY +# KIND, either express or implied. See the License for the +# specific language governing permissions and limitations +# under the License. + +import calendar +from airflow.sensors.base_sensor_operator import BaseSensorOperator +from airflow.utils import timezone +from airflow.utils.decorators import apply_defaults + + +class DayOfWeekSensor(BaseSensorOperator): +""" +Waits until the first specified day of the week. For example, if the execution +day of the task is '2018-12-22' (Saturday) and you pass '5', the task will wait +until next Friday (Week Day: 5) + +:param week_day_number: Day of the week as an integer, where Monday is 1 and +Sunday is 7 (ISO Week Numbering) +:type week_day_number: int +:param use_task_execution_day: If ``True``, uses task's execution day to compare +with week_day_number. Execution Date is Useful for backfilling. +If ``False``, uses system's day of the week. Useful when you +don't want to run anything on weekdays on the system. +:type use_task_execution_day: bool +""" + +@apply_defaults +def __init__(self, week_day_number, + use_task_execution_day=False, + *args, **kwargs): +super(DayOfWeekSensor, self).__init__(*args, **kwargs) +self.week_day_number = week_day_number +self.use_task_execution_day = use_task_execution_day + +def poke(self, context): +if self.week_day_number > 7: +raise ValueError( +'Invalid value ({}) for week_day_number! ' +'Valid value: 1 <= week_day_number <= 7'.format(self.week_day_number)) +self.log.info('Poking until weekday is %s, Today is %s', + calendar.day_name[self.week_day_number - 1], + calendar.day_name[timezone.utcnow().isoweekday() - 1]) +if self.use_task_execution_day: +return context['execution_date'].isoweekday() == self.week_day_number +else: +return timezone.utcnow().isoweekday() == self.week_day_number + + +class WeekEndSensor(BaseSensorOperator): Review comment: My reasoning behind keeping them in the same file is because they are related. But I can move it into a separate file if you think that would be better. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] codecov-io edited a comment on issue #4373: [AIRFLOW-3569] Add "Trigger DAG" button in DAG page
codecov-io edited a comment on issue #4373: [AIRFLOW-3569] Add "Trigger DAG" button in DAG page URL: https://github.com/apache/incubator-airflow/pull/4373#issuecomment-449962906 # [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=h1) Report > Merging [#4373](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=desc) into [master](https://codecov.io/gh/apache/incubator-airflow/commit/c014324be237523d23ceb7a6250218d28befddfd?src=pr=desc) will **not change** coverage. > The diff coverage is `n/a`. [![Impacted file tree graph](https://codecov.io/gh/apache/incubator-airflow/pull/4373/graphs/tree.svg?width=650=WdLKlKHOAU=150=pr)](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=tree) ```diff @@ Coverage Diff @@ ## master#4373 +/- ## === Coverage 78.14% 78.14% === Files 202 202 Lines 1648616486 === Hits1288312883 Misses 3603 3603 ``` -- [Continue to review full report at Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=continue). > **Legend** - [Click here to learn more](https://docs.codecov.io/docs/codecov-delta) > `Δ = absolute (impact)`, `ø = not affected`, `? = missing data` > Powered by [Codecov](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=footer). Last update [c014324...53806ee](https://codecov.io/gh/apache/incubator-airflow/pull/4373?src=pr=lastupdated). Read the [comment docs](https://docs.codecov.io/docs/pull-request-comments). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #4372: AIRFLOW-3567 Show an error in case logs can't be fetched from S3
XD-DENG commented on a change in pull request #4372: AIRFLOW-3567 Show an error in case logs can't be fetched from S3 URL: https://github.com/apache/incubator-airflow/pull/4372#discussion_r243989664 ## File path: airflow/utils/log/s3_task_handler.py ## @@ -124,7 +124,8 @@ def s3_log_exists(self, remote_log_location): try: return self.hook.get_key(remote_log_location) is not None except Exception: -pass +msg = 'Could not list logs at {}'.format(remote_log_location) +self.log.exception(msg) Review comment: Maybe not necessary to have this intermediary variable `msg` here? This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services