[GitHub] [airflow] BasPH commented on a diff in pull request #28907: add some statsd metrics for dataset

2023-01-14 Thread GitBox


BasPH commented on code in PR #28907:
URL: https://github.com/apache/airflow/pull/28907#discussion_r1070241161


##
docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst:
##
@@ -121,6 +121,10 @@ Name   
Descripti
fully 
asynchronous)
 ``triggers.failed``Number 
of triggers that errored before they could fire an event
 ``triggers.succeeded`` Number 
of triggers that have fired at least one event
+``dataset.updates``Number 
of updated datasets
+``dataset.orphaned``   Number 
of datasets marked as orphans because they are no longer referenced in DAG
+   
schedule parameters or task outlets
+``dataset.triggered_runs`` Number 
of runs triggered by a dataset update

Review Comment:
   ```suggestion
   ``dataset.triggered_dagruns`` 
Number of DAG runs triggered by a dataset update
   ```



##
airflow/jobs/scheduler_job.py:
##
@@ -1198,6 +1198,7 @@ def _create_dag_runs_dataset_triggered(
 dag_hash=dag_hash,
 creating_job_id=self.id,
 )
+Stats.incr("dataset.triggered_runs")

Review Comment:
   ```suggestion
   Stats.incr("dataset.triggered_dagruns")
   ```
   
   Would rename for clarity



##
airflow/jobs/scheduler_job.py:
##
@@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session 
= NEW_SESSION) -> None:
 )
 )
 )
+orphaned_datasets_count = 0
 for dataset in orphaned_dataset_query:
 self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri)
 dataset.is_orphaned = expression.true()
+orphaned_datasets_count += 1
+Stats.incr("dataset.orphaned", count=orphaned_datasets_count)

Review Comment:
   I think 
   ```suggestion
   for dataset in orphaned_dataset_query:
   self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri)
   dataset.is_orphaned = expression.true()
   Stats.gauge("dataset.orphaned", len(orphaned_dataset_query))
   ```
   
   This should be a gauge instead of a counter, because datasets can also be 
"un-orphaned".



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org



[GitHub] [airflow] BasPH commented on a diff in pull request #28907: add some statsd metrics for dataset

2023-01-13 Thread GitBox


BasPH commented on code in PR #28907:
URL: https://github.com/apache/airflow/pull/28907#discussion_r1070102683


##
airflow/datasets/manager.py:
##
@@ -66,6 +67,7 @@ def register_dataset_change(
 )
 )
 session.flush()
+Stats.incr("dataset.changed_datasets")

Review Comment:
   ```suggestion
   Stats.incr("dataset.updates")
   ```
   
   I like this change! Metrics feel a bit like an afterthought in many changes. 
I do however suggest a different name, because:
   
   - The docs speak about "updates" instead of changes: 
https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html, 
would stick to that terminology for consistency
   - Remove "datasets" because it's already in the name



##
airflow/jobs/scheduler_job.py:
##
@@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session 
= NEW_SESSION) -> None:
 )
 )
 )
+orphaned_datasets_count = 0
 for dataset in orphaned_dataset_query:
 self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri)
 dataset.is_orphaned = expression.true()
+orphaned_datasets_count += 1
+Stats.incr("dataset.orphaned_datasets", count=orphaned_datasets_count)

Review Comment:
   ```suggestion
   Stats.incr("dataset.orphaned", count=orphaned_datasets_count)
   ```
   
   Same comment here, "dataset" is already in the metric name so would 
deduplicate.



##
docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst:
##
@@ -121,6 +121,10 @@ Name   
Descripti
fully 
asynchronous)
 ``triggers.failed``Number 
of triggers that errored before they could fire an event
 ``triggers.succeeded`` Number 
of triggers that have fired at least one event
+``dataset.changed_datasets``   Number 
of changed datasets
+``dataset.orphaned_datasets``  Number 
of datasets marked as orphans because they are no longer referenced in DAG
+   
schedule parameters or task outlets
+``dataset.triggered_runs`` Number 
of runs triggered by a dataset change

Review Comment:
   ```suggestion
   ``dataset.updates``   Number of 
updated datasets
   ``dataset.orphaned``  Number of 
datasets marked as orphans because they are no longer referenced in DAG
  
schedule parameters or task outlets
   ``dataset.triggered_runs`` 
Number of runs triggered by a dataset update
   ```



-- 
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.

To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org

For queries about this service, please contact Infrastructure at:
us...@infra.apache.org