原始邮件 发件人:[email protected] 收件人:Fanbin [email protected] 发送时间:2020年6月26日(周五) 23:36 主题:Re: datadog failed to send report
Hi, I’m sorry for not explaining it clearly and misread the exception. log4j.logger.org.apache.flink.metrics.datadog.DatadogHttpClient=ERROR log4j.logger.org.apache.flink.runtime.metrics will not work on flink.metrics, it effect on flink.runtime.metrics。 if it does work again, you can see that there are many log profiles in the folder /conf. Modifying config is helpful to control the log output. If it doesn’t work,may be log4j.properties is not being used. You can read this artical for answers[1]. If you’re still not sure, you can change all.A more granular configuration is recommended. I’m not familiar with datadog (I use influxdb to collect metrics). but i think if it can collect metrics, and network is not a problem, the bottleneck may be processing the request but not sure.SocketTimeoutException can occur in serveral situations: 1.the network is down you think the network is ok 2.server processing is slow datadog may deal many requests, and can’t answer fast. you can check cpu usage of the datadog machine. Sometimes it depends on the program, if it use one thread deal all request(this is something that i don’t know about datadog).if cup usage is high, this may be reason, if not, need know about datadog. 3.slow network transmission you need check network,whether the network traffic is full or the machine physical location is far away. you can also find ways to adjust the timeout. 4.your job frequently triggered full gc. you can check gc log, this need to edit flink-conf.yml something like :env.java.opts.taskmanager: -Xloggc:LOG_DIR/taskmanager-gc.log Best wish to you. [1]https://ci.apache.org/projects/flink/flink-docs-release-1.10/monitoring/logging.html 原始邮件 发件人:Fanbin [email protected] 收件人:[email protected] 发送时间:2020年6月26日(周五) 05:38 主题:Re: datadog failed to send report this does not help. log4j.logger.org.apache.flink.runtime.metrics=ERROR i believe all machines can telnet datadog port since there are other metrics reported correctly. how do i check the number of requests capacity? On Tue, Jun 23, 2020 at 11:32 PM seeksst [email protected] wrote: Hi, If you don’t care about losing some metrics, you can edit log4j.properties to ignore it. log4j.logger.org.apache.flink.runtime.metrics=ERROR BTW, Whether all machines can telnet datadog port? Whether the number of requests exceeds the datadog's processing capacity? 原始邮件 发件人:Fanbin [email protected] 收件人:[email protected] 发送时间:2020年6月24日(周三) 12:05 主题:datadog failed to send report Hi, Does any have any idea on the following error msg: (it flooded my task manager log) I do have datadog metrics present so this is probably only happens for some metrics. 2020-06-24 03:27:15,362 WARN org.apache.flink.metrics.datadog.DatadogHttpClient - Failed sending request to Datadog java.net.SocketTimeoutException: timeout at org.apache.flink.shaded.okio.Okio$4.newTimeoutException(Okio.java:227) at org.apache.flink.shaded.okio.AsyncTimeout.exit(AsyncTimeout.java:284) at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:240) at org.apache.flink.shaded.okio.RealBufferedSource.indexOf(RealBufferedSource.java:344) at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:216) at org.apache.flink.shaded.okio.RealBufferedSource.readUtf8LineStrict(RealBufferedSource.java:210) at org.apache.flink.shaded.okhttp3.internal.http1.Http1Codec.readResponseHeaders(Http1Codec.java:189) at org.apache.flink.shaded.okhttp3.internal.http.CallServerInterceptor.intercept(CallServerInterceptor.java:75) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:45) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:120) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:92) at org.apache.flink.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:67) at org.apache.flink.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:185) at org.apache.flink.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:135) at org.apache.flink.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) at java.lang.Thread.run(Thread.java:748) Caused by: java.net.SocketException: Socket closed at java.net.SocketInputStream.read(SocketInputStream.java:204) at java.net.SocketInputStream.read(SocketInputStream.java:141) at sun.security.ssl.InputRecord.readFully(InputRecord.java:465) at sun.security.ssl.InputRecord.read(InputRecord.java:503) at sun.security.ssl.SSLSocketImpl.readRecord(SSLSocketImpl.java:975) at sun.security.ssl.SSLSocketImpl.readDataRecord(SSLSocketImpl.java:933) at sun.security.ssl.AppInputStream.read(AppInputStream.java:105) at org.apache.flink.shaded.okio.Okio$2.read(Okio.java:138) at org.apache.flink.shaded.okio.AsyncTimeout$2.read(AsyncTimeout.java:236) ... 23 more
