dcausse created this task.
dcausse added projects: Wikidata-Query-Service, serviceops.
Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION
Seen on k8s staging when the jobmanager tries to look up for its leader
election config maps:
{"@timestamp":"2021-07-26T14:44:10,262","log.level":"INFO","message":"Create
KubernetesLeaderElector
rdf-streaming-updater-staging-flink-cluster-restserver-leader with lock
identity a5258638-2ef8-4bf2-a9e8-f07d2
8efe7db.","error.stack_trace":"","process.thread.name":"main","log.logger":"org.apache.flink.kubernetes.kubeclient.resources.KubernetesLeaderElector","ecs.version":"1.7.0"}
{"@timestamp":"2021-07-26T14:44:20,440","log.level":"WARN","message":"Exec
Failure","error.stack_trace":" java.net.SocketTimeoutException: connect timed
out
at java.base/java.net.PlainSocketImpl.socketConnect(Native Method)
at
java.base/java.net.AbstractPlainSocketImpl.doConnect(AbstractPlainSocketImpl.java:399)
at
java.base/java.net.AbstractPlainSocketImpl.connectToAddress(AbstractPlainSocketImpl.java:242)
at
java.base/java.net.AbstractPlainSocketImpl.connect(AbstractPlainSocketImpl.java:224)
at
java.base/java.net.SocksSocketImpl.connect(SocksSocketImpl.java:403)
at java.base/java.net.Socket.connect(Socket.java:609)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.platform.Platform.connectSocket(Platform.java:129)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connectSocket(RealConnection.java:246)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.RealConnection.connect(RealConnection.java:166)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findConnection(StreamAllocation.java:257)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.findHealthyConnection(StreamAllocation.java:135)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.StreamAllocation.newStream(StreamAllocation.java:114)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.connection.ConnectInterceptor.intercept(ConnectInterceptor.java:42)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.cache.CacheInterceptor.intercept(CacheInterceptor.java:93)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.BridgeInterceptor.intercept(BridgeInterceptor.java:93)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RetryAndFollowUpInterceptor.intercept(RetryAndFollowUpInterceptor.java:126)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.BackwardsCompatibilityInterceptor.intercept(BackwardsCompatibilityInterceptor.java:134)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.ImpersonatorInterceptor.intercept(ImpersonatorInterceptor.java:68)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
io.fabric8.kubernetes.client.utils.HttpClientUtils.lambda$createHttpClient$3(HttpClientUtils.java:109)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:147)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.http.RealInterceptorChain.proceed(RealInterceptorChain.java:121)
at
org.apache.flink.kubernetes.shaded.okhttp3.RealCall.getResponseWithInterceptorChain(RealCall.java:254)
at
org.apache.flink.kubernetes.shaded.okhttp3.RealCall$AsyncCall.execute(RealCall.java:200)
at
org.apache.flink.kubernetes.shaded.okhttp3.internal.NamedRunnable.run(NamedRunnable.java:32)
at
java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
at
java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
at java.base/java.lang.Thread.run(Thread.java:834)
","process.thread.name":"OkHttp
https://10.64.76.1/...","log.logger":"io.fabric8.kubernetes.client.dsl.internal.WatchConnectionManager","ecs.version":"1.7.0"}
Unfortunately the error message does not specify the endpoint it tries to
connect to but reading the k8s client library
<https://github.com/fabric8io/kubernetes-client/blob/v4.9.2/kubernetes-client/src/main/java/io/fabric8/kubernetes/client/Config.java#L125>
that flink uses we can see that it uses https://kubernetes.default.svc.
Test pod is `flink-session-cluster-main-jobmanager-69d4f56989-k5429` in the
`rdf-streaming-updater` namespace of the staging cluster.
TASK DETAIL
https://phabricator.wikimedia.org/T287443
EMAIL PREFERENCES
https://phabricator.wikimedia.org/settings/panel/emailpreferences/
To: dcausse
Cc: JMeybohm, dcausse, Aklapper, MPhamWMF, wkandek, CBogen, Namenlos314,
jijiki, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll,
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Dzahn
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]