[GitHub] [airflow] ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator
ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator URL: https://github.com/apache/airflow/pull/7162#discussion_r367384009 ## File path: tests/serialization/test_dag_serialization.py ## @@ -543,6 +543,68 @@ def test_dag_serialized_fields_with_schema(self): dag_params: set = set(dag_schema.keys()) - ignored_keys self.assertEqual(set(DAG.get_serialized_fields()), dag_params) +def test_no_new_fields_added_to_base_operator(self): +""" +This test verifies that there are no new fields added to BaseOperator. And reminds that +tests should be added for it. +""" +base_operator = BaseOperator(task_id="10") +fields = base_operator.__dict__ +self.assertEqual({'_dag': None, + '_downstream_task_ids': set(), + '_inlets': [], + '_log': base_operator.log, + '_outlets': [], + '_upstream_task_ids': set(), + 'depends_on_past': False, + 'do_xcom_push': True, + 'email': None, + 'email_on_failure': True, + 'email_on_retry': True, + 'end_date': None, + 'execution_timeout': None, + 'executor_config': {}, + 'inlets': [], + 'max_retry_delay': None, + 'on_execute_callback': None, + 'on_failure_callback': None, + 'on_retry_callback': None, + 'on_success_callback': None, + 'outlets': [], + 'owner': 'airflow', + 'params': {}, + 'pool': 'default_pool', + 'priority_weight': 1, + 'queue': 'default', + 'resources': None, + 'retries': 0, + 'retry_delay': timedelta(0, 300), + 'retry_exponential_backoff': False, + 'run_as_user': None, + 'sla': None, + 'start_date': None, + 'subdag': None, + 'task_concurrency': None, + 'task_id': '10', + 'trigger_rule': 'all_success', + 'wait_for_downstream': False, + 'weight_rule': 'downstream'}, fields, + """ +!!! + + ACTION NEEDED! PLEASE READ THIS CAREFULLY AND CORRECT TESTS CAREFULLY + + Some fields were added to the BaseOperator! Please add them to the list above and make sure that + you add support for DAG serialization - you should add the field to + `airflow/serialization/schema.json` - they should have correct type defined there. + + Note that we do not support versioning yet so you should only add optional fields. We do not support + versioning yet so you should make sure all fields added to the BaseOperator should be optional. Review comment: You've duplicated(ish) the message here. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator
ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator URL: https://github.com/apache/airflow/pull/7162#discussion_r367051937 ## File path: tests/serialization/test_dag_serialization.py ## @@ -543,6 +543,66 @@ def test_dag_serialized_fields_with_schema(self): dag_params: set = set(dag_schema.keys()) - ignored_keys self.assertEqual(set(DAG.get_serialized_fields()), dag_params) +def test_no_new_fields_added_to_base_operator(self): +""" +This test verifies that there are no new fields added to BaseOperator. And reminds that +tests should be added for it. +""" +base_operator = BaseOperator(task_id="10") +fields = base_operator.__dict__ +self.assertEqual({'_dag': None, + '_downstream_task_ids': set(), + '_inlets': [], + '_log': base_operator.log, + '_outlets': [], + '_upstream_task_ids': set(), + 'depends_on_past': False, + 'do_xcom_push': True, + 'email': None, + 'email_on_failure': True, + 'email_on_retry': True, + 'end_date': None, + 'execution_timeout': None, + 'executor_config': {}, + 'inlets': [], + 'max_retry_delay': None, + 'on_execute_callback': None, + 'on_failure_callback': None, + 'on_retry_callback': None, + 'on_success_callback': None, + 'outlets': [], + 'owner': 'airflow', + 'params': {}, + 'pool': 'default_pool', + 'priority_weight': 1, + 'queue': 'default', + 'resources': None, + 'retries': 0, + 'retry_delay': timedelta(0, 300), + 'retry_exponential_backoff': False, + 'run_as_user': None, + 'sla': None, + 'start_date': None, + 'subdag': None, + 'task_concurrency': None, + 'task_id': '10', + 'trigger_rule': 'all_success', + 'wait_for_downstream': False, + 'weight_rule': 'downstream'}, fields, + """ +!!! + + ACTION NEEDED! PLEASE READ THIS CAREFULLY AND CORRECT TESTS CAREFULLY + + Some fields were added to the BaseOperator! Please add them to the list above and make sure that + you add support for DAG serialization - you should add the field to + `airflow/serialization/schema.json` and add it in `serialized_simple_dag_ground_truth` above Review comment: So the goal of the ground truth test is to ensure that DAGs that are _currently_ in people's databases get correctly handled as the model changes. The important bit I think is that this data can't change sa it's fixed in a DB. _But_ some changes such as adding an optional field to Operators/Tasks is allowed, as the JSON Schema allows unknown fields at the task level. But by default BaseOperator.get_serialized_fields will include extra fields. So I guess the only check I would like here is that new fields from base operator get specified with a type in the JSON schema, and that it sohuld be optional (if it's not optional then we would have to bump the version field in our schema. But we haven't worked out how to handle versioning of schemas and blobs yet!) -- which is sadly harder to test for. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] [airflow] ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator
ashb commented on a change in pull request #7162: [AIRFLOW-6557] Add test for newly added fields in BaseOperator URL: https://github.com/apache/airflow/pull/7162#discussion_r366358285 ## File path: tests/serialization/test_dag_serialization.py ## @@ -543,6 +543,66 @@ def test_dag_serialized_fields_with_schema(self): dag_params: set = set(dag_schema.keys()) - ignored_keys self.assertEqual(set(DAG.get_serialized_fields()), dag_params) +def test_no_new_fields_added_to_base_operator(self): +""" +This test verifies that there are no new fields added to BaseOperator. And reminds that +tests should be added for it. +""" +base_operator = BaseOperator(task_id="10") +fields = base_operator.__dict__ +self.assertEqual({'_dag': None, + '_downstream_task_ids': set(), + '_inlets': [], + '_log': base_operator.log, + '_outlets': [], + '_upstream_task_ids': set(), + 'depends_on_past': False, + 'do_xcom_push': True, + 'email': None, + 'email_on_failure': True, + 'email_on_retry': True, + 'end_date': None, + 'execution_timeout': None, + 'executor_config': {}, + 'inlets': [], + 'max_retry_delay': None, + 'on_execute_callback': None, + 'on_failure_callback': None, + 'on_retry_callback': None, + 'on_success_callback': None, + 'outlets': [], + 'owner': 'airflow', + 'params': {}, + 'pool': 'default_pool', + 'priority_weight': 1, + 'queue': 'default', + 'resources': None, + 'retries': 0, + 'retry_delay': timedelta(0, 300), + 'retry_exponential_backoff': False, + 'run_as_user': None, + 'sla': None, + 'start_date': None, + 'subdag': None, + 'task_concurrency': None, + 'task_id': '10', + 'trigger_rule': 'all_success', + 'wait_for_downstream': False, + 'weight_rule': 'downstream'}, fields, + """ +!!! + + ACTION NEEDED! PLEASE READ THIS CAREFULLY AND CORRECT TESTS CAREFULLY + + Some fields were added to the BaseOperator! Please add them to the list above and make sure that + you add support for DAG serialization - you should add the field to + `airflow/serialization/schema.json` and add it in `serialized_simple_dag_ground_truth` above Review comment: Having said that I'm now not sure this last part is true. So since DAG serialization schema version 1.0 is now released with 1.10.7 we might have to start thinking about versioning of this schema and migrate/update. But the important bit is that a row that exists in the DB right now must continue to work and that is what the "ground truth" (not the best name) is meant to represent. This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services