[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-07 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208416023
 
 

 ##
 File path: tests/models.py
 ##
 @@ -56,7 +56,7 @@
 from airflow.utils.trigger_rule import TriggerRule
 from mock import patch, ANY
 from parameterized import parameterized
-from tempfile import NamedTemporaryFile
+from tempfile import NamedTemporaryFile, mkdtemp
 
 Review comment:
   Updated as suggested. Thanks.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r208080167
 
 

 ##
 File path: tests/models.py
 ##
 @@ -1038,6 +1038,23 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag()
+
+for d in invalid_dag_files:
+dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
+
+files_with_cron_error = [os.path.split(k)[1]
+ for k, v in dagbag.import_errors.items()
+ if "Invalid Cron expression" in v]
+for d in invalid_dag_files:
+self.assertTrue(d in files_with_cron_error)
 
 Review comment:
   :-|   Crazy now LOL
   
   Thanks again! And look forward to your reviewing in any of my future 
commits! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207796998
 
 

 ##
 File path: tests/models.py
 ##
 @@ -1038,6 +1038,21 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag()
+
+for d in invalid_dag_files:
+dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
+
+for d in invalid_dag_files:
+self.assertTrue(any([d in k and "Invalid Cron expression" in v
 
 Review comment:
   Hi @yrqls21 , great points!
   
   - For your **1st point**, actually you're right. It's not really necessary 
to re-parse these two files, i.e. the code below can be moved.
   
   ```
   for d in invalid_dag_files:
   dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
   ``` 
   But I purposely keep them here to make it clear what I'm testing here.
   
   - Your **2nd point** is great. I agree it makes no practical difference 
here, given `n` and `m` would never be too big here. But I think I will change 
this part as well. Always good to keep performance in mind.
   
   Thanks for your reviewing! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207796998
 
 

 ##
 File path: tests/models.py
 ##
 @@ -1038,6 +1038,21 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag()
+
+for d in invalid_dag_files:
+dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
+
+for d in invalid_dag_files:
+self.assertTrue(any([d in k and "Invalid Cron expression" in v
 
 Review comment:
   Hi @yrqls21 , great points!
   
   - For your **1st point**, actually you're right. It's not really necessary 
to re-parse these two files, i.e. the code below can be moved.
   
   ```
   for d in invalid_dag_files:
   dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
   ``` 
   But I purposely keep them here to make it clear to readers what I'm testing 
here.
   
   - Your **2nd point** is great. I agree it makes no practical difference 
here, given `n` and `m` would never be too big here. But I think I will change 
this part as well. Always good to keep performance in mind.
   
   Thanks for your reviewing! 


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207791340
 
 

 ##
 File path: tests/models.py
 ##
 @@ -1038,6 +1038,21 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag()
+
+for d in invalid_dag_files:
+dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
+
+for d in invalid_dag_files:
+self.assertTrue(any([d in k and "Invalid Cron expression" in v
 
 Review comment:
   `dagbag.import_errors` is a Dict whose Key are file path and Value are 
exception details.
   
   Here what I did is to check "File is the one I'm intending to check" AND 
"exception is the exception that I'm looking into". The Length of this Boolean 
value List is the same as the number of entries in metadata `import_error`, in 
which there may be other errors other than those I'm checking here (here I only 
try to identify Cron invalidity errors.).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207788680
 
 

 ##
 File path: tests/models.py
 ##
 @@ -1038,6 +1038,21 @@ def test_zip(self):
 dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, "test_zip.zip"))
 self.assertTrue(dagbag.get_dag("test_zip_dag"))
 
+def test_process_file_cron_validity_check(self):
+"""
+test if an invalid cron expression
+as schedule interval can be identified
+"""
+invalid_dag_files = ["test_invalid_cron.py", 
"test_zip_invalid_cron.zip"]
+dagbag = models.DagBag()
+
+for d in invalid_dag_files:
+dagbag.process_file(os.path.join(TEST_DAGS_FOLDER, d))
+
+for d in invalid_dag_files:
+self.assertTrue(any([d in k and "Invalid Cron expression" in v
 
 Review comment:
   The statement inside `any()` is returning a List of Boolean values.
   `Any()` helps check if there is at least one `True` element.


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services


[GitHub] XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron Expression Validity in DagBag.process_file()

2018-08-06 Thread GitBox
XD-DENG commented on a change in pull request #3698: [AIRFLOW-2855] Check Cron 
Expression Validity in DagBag.process_file()
URL: https://github.com/apache/incubator-airflow/pull/3698#discussion_r207788097
 
 

 ##
 File path: airflow/models.py
 ##
 @@ -414,6 +427,16 @@ def process_file(self, filepath, only_if_updated=True, 
safe_mode=True):
 self.bag_dag(dag, parent_dag=dag, root_dag=dag)
 found_dags.append(dag)
 found_dags += dag.subdags
+if isinstance(dag._schedule_interval, 
six.string_types):
+croniter(dag._schedule_interval)
+except (CroniterBadCronError,
+CroniterBadDateError,
+CroniterNotAlphaError) as cron_e:
+self.log.exception("Failed to bag_dag: %s", 
dag.full_filepath)
+self.import_errors[dag.full_filepath] = \
+"Invalid Cron expression: " + str(cron_e)
+self.file_last_changed[dag.full_filepath] = \
+file_last_changed_on_disk
 
 Review comment:
   One is for `.py` DAG file, and the another is for package DAG file (`.zip` 
file).


This is an automated message from the Apache Git Service.
To respond to the message, please log on GitHub and use the
URL above to go to the specific comment.
 
For queries about this service, please contact Infrastructure at:
us...@infra.apache.org


With regards,
Apache Git Services