Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
andreahlert commented on PR #62811: URL: https://github.com/apache/airflow/pull/62811#issuecomment-4041112877 Thanks everyone for the reviews and feedback! #61242 already addressed the root cause (DNS flakiness when waiting for the OTel collector) by adding wait_for_otel_collector, and #56150 later did a broader rewrite of the test file. There's nothing left here that isn't already covered, so I'm closing this. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
andreahlert closed pull request #62811: Fix flaky OTEL connection in integration tests URL: https://github.com/apache/airflow/pull/62811 -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
potiuk commented on PR #62811: URL: https://github.com/apache/airflow/pull/62811#issuecomment-4016255744 The OTEL integration tests have been completely rewritten now, so likely this PR is not needed any more. Converting it to Draft - as we observe stability - we might resume it if we find that the changes did not help. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
Copilot commented on code in PR #62811:
URL: https://github.com/apache/airflow/pull/62811#discussion_r2895568683
##
airflow-core/src/airflow/ui/tests/e2e/specs/backfill.spec.ts:
##
@@ -218,8 +218,8 @@ test.describe("Backfill pause, resume, and cancel
controls", () => {
await expect(async () => {
await backfillPage.page.reload();
await expect(backfillPage.triggerButton).toBeVisible({ timeout: 10_000
});
- await expect(backfillPage.pauseButton).toBeVisible({ timeout: 5000 });
-}).toPass({ timeout: 60_000 });
+ await expect(backfillPage.pauseButton).toBeVisible({ timeout: 15_000 });
+}).toPass({ timeout: 120_000 });
Review Comment:
The change to `backfill.spec.ts` (increasing the `pauseButton` visibility
timeout from 5 000 ms to 15 000 ms and the outer `toPass` timeout from 60 000
ms to 120 000 ms) is completely unrelated to the stated PR goal of fixing flaky
OTEL connection issues. The PR description mentions only OTEL integration test
fixes and an OTEL Collector healthcheck, but makes no mention of these
Playwright backfill E2E test timeout changes. This appears to be an unrelated
change that was bundled into this PR.
##
airflow-core/tests/integration/otel/test_otel.py:
##
@@ -212,6 +212,10 @@ def setup_class(cls):
os.environ["AIRFLOW__TRACES__OTEL_ON"] = "True"
os.environ["OTEL_EXPORTER_OTLP_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] =
"http://breeze-otel-collector:4318/v1/traces";
+if cls.use_otel != "true":
+os.environ["OTEL_TRACES_EXPORTER"] = "console"
Review Comment:
Setting `OTEL_TRACES_EXPORTER=console` when `use_otel != "true"` will cause
`test_dag_execution_succeeds` (lines 441–524) to fail, because that test always
queries the Jaeger API for span data (line 497:
`requests.get(f"http://{host}:16686/api/traces?service={service_name}";)`). With
`OTEL_TRACES_EXPORTER=console`, no spans are sent to Jaeger (or via the
otel-collector), so the test will either receive an empty trace list (causing
an `IndexError` on `data["data"][-1]`) or assert against stale spans from a
prior test run.
Either this test needs a `use_otel == "true"` guard around its Jaeger
query-and-assert block, or this env var should not be set to `console` for the
span exporter when the Jaeger test expects OTLP data.
Note that the `AIRFLOW__METRICS__OTEL_ON` and `OTEL_METRICS_EXPORTER`
changes on lines 217–218 are fine, as those affect only metrics, not spans.
```suggestion
```
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
jason810496 commented on code in PR #62811:
URL: https://github.com/apache/airflow/pull/62811#discussion_r2895548910
##
airflow-core/tests/integration/otel/test_otel.py:
##
@@ -212,6 +212,10 @@ def setup_class(cls):
os.environ["AIRFLOW__TRACES__OTEL_ON"] = "True"
os.environ["OTEL_EXPORTER_OTLP_PROTOCOL"] = "http/protobuf"
os.environ["OTEL_EXPORTER_OTLP_TRACES_ENDPOINT"] =
"http://breeze-otel-collector:4318/v1/traces";
+if cls.use_otel != "true":
+os.environ["OTEL_TRACES_EXPORTER"] = "console"
+os.environ["OTEL_METRICS_EXPORTER"] = "none"
Review Comment:
Should we set `OTEL_METRICS_EXPORTER` as `"console"`?
https://github.com/apache/airflow/blob/63ea296047d7e31b5c929043b11929c1484163d6/airflow-core/tests/integration/otel/test_otel.py#L830-L831
##
airflow-core/src/airflow/ui/tests/e2e/specs/backfill.spec.ts:
##
@@ -218,8 +218,8 @@ test.describe("Backfill pause, resume, and cancel
controls", () => {
await expect(async () => {
await backfillPage.page.reload();
await expect(backfillPage.triggerButton).toBeVisible({ timeout: 10_000
});
- await expect(backfillPage.pauseButton).toBeVisible({ timeout: 5000 });
-}).toPass({ timeout: 60_000 });
+ await expect(backfillPage.pauseButton).toBeVisible({ timeout: 15_000 });
+}).toPass({ timeout: 120_000 });
Review Comment:
This is UI only e2e test instead of the integration test mentioning in the
issue.
--
This is an automated message from the Apache Git Service.
To respond to the message, please log on to GitHub and use the
URL above to go to the specific comment.
To unsubscribe, e-mail: [email protected]
For queries about this service, please contact Infrastructure at:
[email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
andreahlert commented on PR #62811: URL: https://github.com/apache/airflow/pull/62811#issuecomment-3993482749 All CI failures are unrelated to this PR — they are caused by transient GitHub infrastructure errors (HTTP 500 during \ctions/checkout\) and a missing \helm gpg\ command in the CI runner. Could a maintainer please re-run the failed jobs? Thank you. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
andreahlert commented on code in PR #62811: URL: https://github.com/apache/airflow/pull/62811#discussion_r2880093560 ## airflow-core/newsfragments/62769.misc.rst: ## @@ -0,0 +1 @@ +Fix flaky OTEL connection in integration tests. Disable OTLP metric export for span-only tests and add OTEL collector healthcheck so Airflow waits for the collector to be ready. Review Comment: Sorry, my fault. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
Re: [PR] Fix flaky OTEL connection in integration tests [airflow]
potiuk commented on code in PR #62811: URL: https://github.com/apache/airflow/pull/62811#discussion_r2880017165 ## airflow-core/newsfragments/62769.bugfix.rst: ## @@ -0,0 +1 @@ +Fix flaky OTEL connection in integration tests. Disable OTLP metric export for span-only tests and add OTEL collector healthcheck so Airflow waits for the collector to be ready. Review Comment: This is absolutely not needed. We only need newsfragments for serious **user** facing changes. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected]
