[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208416023 ## File path: tests/models.py ## @@ -56,7 +56,7 @@ from airflow.utils.trigger_rule import TriggerRule from mock import patch, ANY from parameterized import parameterized -from tempfile import NamedTemporaryFile +from tempfile import NamedTemporaryFile, mkdtemp Review comment: Updated as suggested. Thanks. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208080167 ## File path: tests/models.py ## @@ -1038,6 +1038,23 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag() + +for d in invalid_dag_files: +dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) + +files_with_cron_error = [os.path.split(k)[1] + for k, v in dagbag.import_errors.items() + if "Invalid Cron expression" in v] +for d in invalid_dag_files: +self.assertTrue(d in files_with_cron_error) Review comment: :-| Crazy now LOL Thanks again! And look forward to your reviewing in any of my future commits! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207796998 ## File path: tests/models.py ## @@ -1038,6 +1038,21 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag() + +for d in invalid_dag_files: +dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) + +for d in invalid_dag_files: +self.assertTrue(any([d in k and "Invalid Cron expression" in v Review comment: Hi @yrqls21 , great points! - For your **1st point**, actually you're right. It's not really necessary to re-parse these two files, i.e. the code below can be moved. ``` for d in invalid_dag_files: dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) ``` But I purposely keep them here to make it clear what I'm testing here. - Your **2nd point** is great. I agree it makes no practical difference here, given `n` and `m` would never be too big here. But I think I will change this part as well. Always good to keep performance in mind. Thanks for your reviewing! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207796998 ## File path: tests/models.py ## @@ -1038,6 +1038,21 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag() + +for d in invalid_dag_files: +dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) + +for d in invalid_dag_files: +self.assertTrue(any([d in k and "Invalid Cron expression" in v Review comment: Hi @yrqls21 , great points! - For your **1st point**, actually you're right. It's not really necessary to re-parse these two files, i.e. the code below can be moved. ``` for d in invalid_dag_files: dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) ``` But I purposely keep them here to make it clear to readers what I'm testing here. - Your **2nd point** is great. I agree it makes no practical difference here, given `n` and `m` would never be too big here. But I think I will change this part as well. Always good to keep performance in mind. Thanks for your reviewing! This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207791340 ## File path: tests/models.py ## @@ -1038,6 +1038,21 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag() + +for d in invalid_dag_files: +dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) + +for d in invalid_dag_files: +self.assertTrue(any([d in k and "Invalid Cron expression" in v Review comment: `dagbag.import_errors` is a Dict whose Key are file path and Value are exception details. Here what I did is to check "File is the one I'm intending to check" AND "exception is the exception that I'm looking into". The Length of this Boolean value List is the same as the number of entries in metadata `import_error`, in which there may be other errors other than those I'm checking here (here I only try to identify Cron invalidity errors.). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207788680 ## File path: tests/models.py ## @@ -1038,6 +1038,21 @@ def test_zip(self): dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip")) self.assertTrue(dagbag.get_dag("test_zip_dag")) +def test_process_file_cron_validity_check(self): +""" +test if an invalid cron expression +as schedule interval can be identified +""" +invalid_dag_files = ["test_invalid_cron.py", "test_zip_invalid_cron.zip"] +dagbag = models.DagBag() + +for d in invalid_dag_files: +dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d)) + +for d in invalid_dag_files: +self.assertTrue(any([d in k and "Invalid Cron expression" in v Review comment: The statement inside `any()` is returning a List of Boolean values. `Any()` helps check if there is at least one `True` element. This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services
[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file() URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207788097 ## File path: airflow/models.py ## @@ -414,6 +427,16 @@ def process_file(self, filepath, only_if_updated=True, safe_mode=True): self.bag_dag(dag, parent_dag=dag, root_dag=dag) found_dags.append(dag) found_dags += dag.subdags +if isinstance(dag._schedule_interval, six.string_types): +croniter(dag._schedule_interval) +except (CroniterBadCronError, +CroniterBadDateError, +CroniterNotAlphaError) as cron_e: +self.log.exception("Failed to bag_dag: %s", dag.full_filepath) +self.import_errors[dag.full_filepath] = \ +"Invalid Cron expression: " + str(cron_e) +self.file_last_changed[dag.full_filepath] = \ +file_last_changed_on_disk Review comment: One is for `.py` DAG file, and the another is for package DAG file (`.zip` file). This is an automated message from the Apache Git Service. To respond to the message, please log on GitHub and use the URL above to go to the specific comment. For queries about this service, please contact Infrastructure at: us...@infra.apache.org With regards, Apache Git Services