hello,
  we build flink report metrics to prometheus pushgateway, the program has been 
running for a period of time, with a amount of data reported to pushgateway, 
pushgateway response socket timeout exception, and much of metrics data 
reported failed. following is the exception:


 2023-12-12 04:13:07,812 WARN 
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter [] - Failed 
to push metrics to PushGateway with jobName
00034937_20231211200917_54ede15602bb8704c3a98ec481bea96, groupingKey{}.
java.net.SocketTimeoutException: Read timed out
at java.net.SocketInputStream. socketRead(Native Method) ~[?:1.8.0_281]
at java.net.SocketInputStream.socketRead(SocketInputStream.java:116) ~[?:1.8.0 
281]
at java.net.SocketInputStream.read(SocketInputStream. java:171) ~[?:1.8.0 281] 
at java.net.SocketInputStream.read(SocketInputStream. java:141) ~[?:1.8.0 2811
at java.io.BufferedInputStream.fill (BufferedInputStream. java:246) ~[?:1.8.0 
2811 at java.io. BufferedInputStream.read1(BufferedInputStream.java:286) 
~[?:1.8.0_281] at 
java.io.BufferedInputStream.read(BufferedInputStream.java:345) ~[?:1.8.0 281] 
at sun.net.www.http.HttpClient.parseHTTPHeader(HttpClient.java:735) 
~[?:1.8.0_281] at sun.net.www.http.HttpClient.parseHTTP(HttpClient.java:678) 
~[?:1.8.0_281] at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream0(HttpURLConnection.java:1593)
 ~[?:1.8.0_281] at 
sun.net.www.protocol.http.HttpURLConnection.getInputStream(HttpURLConnection.java:1498)
 ~[?:1.8.0 2811 at 
java.net.HttpURLConnection.getResponseCode(HttpURLConnection.java:480)~[?:1.8.0_281]
 at 
io.prometheus.client.exporter.PushGateway.doRequest(PushGateway.java:315)~[flink-metrics-prometheus-1.13.5.jar:1.13.5]
at io.prometheus. client.exporter .PushGateway .push (PushGatevay . java:138) 
~[flink-metrics-prometheus-1.13.5. jar:1.13.51
at 
org.apache.flink.metrics.prometheus.PrometheusPushGatewayReporter.report(PrometheusPushGatewayReporter.java:63)
[flink-metrics-prometheus-1.13.5.jar:1.13.51
at org.apache. flink.runtime.metrics.MetricRegistryImp1$ReporterTask.run 
(MetricRegistryImpl. java:494) [flink-dist_2.11-1.13.5.jar:1.13.5]

after test, it was caused with amount of data reported to pushgateway, then we 
restart pushgateway server and the exception disappeared, but after sever hours 
the exception re-emergenced.

so i want to know how to config flink or pushgateway to avoid the exception?

best regards.
leilinee 

Reply via email to