[GitHub] [airflow] BasPH commented on a diff in pull request #28907: add some statsd metrics for dataset
BasPH commented on code in PR #28907: URL: https://github.com/apache/airflow/pull/28907#discussion_r1070241161 ## docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst: ## @@ -121,6 +121,10 @@ Name Descripti fully asynchronous) ``triggers.failed``Number of triggers that errored before they could fire an event ``triggers.succeeded`` Number of triggers that have fired at least one event +``dataset.updates``Number of updated datasets +``dataset.orphaned`` Number of datasets marked as orphans because they are no longer referenced in DAG + schedule parameters or task outlets +``dataset.triggered_runs`` Number of runs triggered by a dataset update Review Comment: ```suggestion ``dataset.triggered_dagruns`` Number of DAG runs triggered by a dataset update ``` ## airflow/jobs/scheduler_job.py: ## @@ -1198,6 +1198,7 @@ def _create_dag_runs_dataset_triggered( dag_hash=dag_hash, creating_job_id=self.id, ) +Stats.incr("dataset.triggered_runs") Review Comment: ```suggestion Stats.incr("dataset.triggered_dagruns") ``` Would rename for clarity ## airflow/jobs/scheduler_job.py: ## @@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session = NEW_SESSION) -> None: ) ) ) +orphaned_datasets_count = 0 for dataset in orphaned_dataset_query: self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri) dataset.is_orphaned = expression.true() +orphaned_datasets_count += 1 +Stats.incr("dataset.orphaned", count=orphaned_datasets_count) Review Comment: I think ```suggestion for dataset in orphaned_dataset_query: self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri) dataset.is_orphaned = expression.true() Stats.gauge("dataset.orphaned", len(orphaned_dataset_query)) ``` This should be a gauge instead of a counter, because datasets can also be "un-orphaned". -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org
[GitHub] [airflow] BasPH commented on a diff in pull request #28907: add some statsd metrics for dataset
BasPH commented on code in PR #28907: URL: https://github.com/apache/airflow/pull/28907#discussion_r1070102683 ## airflow/datasets/manager.py: ## @@ -66,6 +67,7 @@ def register_dataset_change( ) ) session.flush() +Stats.incr("dataset.changed_datasets") Review Comment: ```suggestion Stats.incr("dataset.updates") ``` I like this change! Metrics feel a bit like an afterthought in many changes. I do however suggest a different name, because: - The docs speak about "updates" instead of changes: https://airflow.apache.org/docs/apache-airflow/stable/concepts/datasets.html, would stick to that terminology for consistency - Remove "datasets" because it's already in the name ## airflow/jobs/scheduler_job.py: ## @@ -1633,6 +1634,9 @@ def _orphan_unreferenced_datasets(self, session: Session = NEW_SESSION) -> None: ) ) ) +orphaned_datasets_count = 0 for dataset in orphaned_dataset_query: self.log.info("Orphaning unreferenced dataset '%s'", dataset.uri) dataset.is_orphaned = expression.true() +orphaned_datasets_count += 1 +Stats.incr("dataset.orphaned_datasets", count=orphaned_datasets_count) Review Comment: ```suggestion Stats.incr("dataset.orphaned", count=orphaned_datasets_count) ``` Same comment here, "dataset" is already in the metric name so would deduplicate. ## docs/apache-airflow/administration-and-deployment/logging-monitoring/metrics.rst: ## @@ -121,6 +121,10 @@ Name Descripti fully asynchronous) ``triggers.failed``Number of triggers that errored before they could fire an event ``triggers.succeeded`` Number of triggers that have fired at least one event +``dataset.changed_datasets`` Number of changed datasets +``dataset.orphaned_datasets`` Number of datasets marked as orphans because they are no longer referenced in DAG + schedule parameters or task outlets +``dataset.triggered_runs`` Number of runs triggered by a dataset change Review Comment: ```suggestion ``dataset.updates`` Number of updated datasets ``dataset.orphaned`` Number of datasets marked as orphans because they are no longer referenced in DAG schedule parameters or task outlets ``dataset.triggered_runs`` Number of runs triggered by a dataset update ``` -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: commits-unsubscr...@airflow.apache.org For queries about this service, please contact Infrastructure at: us...@infra.apache.org