jiwen624 opened a new pull request, #46604: URL: https://github.com/apache/spark/pull/46604
### What changes were proposed in this pull request? Working on it... ### Why are the changes needed? As mentioned in the Jira ticket: https://issues.apache.org/jira/browse/SPARK-48298 Currently, the StatsdSink in Spark supports UDP mode only, which is the default mode of StatsD. However, in real production environments, we often find that a more reliable transmission of metrics is needed to avoid metrics lose in high-traffic systems. TCP mode is already supported by Statsd: https://github.com/statsd/statsd/blob/master/docs/server.md Prometheus' statsd_exporter: https://github.com/prometheus/statsd_exporter and also many other Statsd-based metrics proxies/receivers. ### Does this PR introduce _any_ user-facing change? Yes. The following new config options are added to `conf/metrics.properties.template`: `*.sink.statsd.protocol` `*.sink.statsd.connTimeoutMs` A new error condition is defined in error-conditions.json for protocol configuration error. ### How was this patch tested? Unit tests. Manually tests with metric configurations sending metrics to a Netcat TCP/UDP server ### Was this patch authored or co-authored using generative AI tooling? No. -- This is an automated message from the Apache Git Service. To respond to the message, please log on to GitHub and use the URL above to go to the specific comment. To unsubscribe, e-mail: [email protected] For queries about this service, please contact Infrastructure at: [email protected] --------------------------------------------------------------------- To unsubscribe, e-mail: [email protected] For additional commands, e-mail: [email protected]
