dcausse created this task. dcausse added a project: Wikidata-Query-Service. Restricted Application added a subscriber: Aklapper.
TASK DESCRIPTION While deploying a new version of the streaming-updater (0.3.103) flink failed with: java.io.EOFException at java.base/java.io.DataInputStream.readFully(DataInputStream.java:202) at java.base/java.io.DataInputStream.readFully(DataInputStream.java:170) at org.apache.flink.api.common.typeutils.base.array.BytePrimitiveArraySerializer.deserialize(BytePrimitiveArraySerializer.java:82) at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKVStateData(RocksDBFullRestoreOperation.java:229) at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restoreKeyGroupsInStateHandle(RocksDBFullRestoreOperation.java:158) at org.apache.flink.contrib.streaming.state.restore.RocksDBFullRestoreOperation.restore(RocksDBFullRestoreOperation.java:142) at org.apache.flink.contrib.streaming.state.RocksDBKeyedStateBackendBuilder.build(RocksDBKeyedStateBackendBuilder.java:284) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:587) at org.apache.flink.contrib.streaming.state.RocksDBStateBackend.createKeyedStateBackend(RocksDBStateBackend.java:93) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.lambda$keyedStatedBackend$1(StreamTaskStateInitializerImpl.java:328) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.attemptCreateAndRestore(BackendRestorerProcedure.java:168) at org.apache.flink.streaming.api.operators.BackendRestorerProcedure.createAndRestore(BackendRestorerProcedure.java:135) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.keyedStatedBackend(StreamTaskStateInitializerImpl.java:345) at org.apache.flink.streaming.api.operators.StreamTaskStateInitializerImpl.streamOperatorStateContext(StreamTaskStateInitializerImpl.java:163) at org.apache.flink.streaming.api.operators.AbstractStreamOperator.initializeState(AbstractStreamOperator.java:272) at org.apache.flink.streaming.runtime.tasks.OperatorChain.initializeStateAndOpenOperators(OperatorChain.java:425) at org.apache.flink.streaming.runtime.tasks.StreamTask.lambda$beforeInvoke$2(StreamTask.java:535) at org.apache.flink.streaming.runtime.tasks.StreamTaskActionExecutor$1.runThrowing(StreamTaskActionExecutor.java:50) at org.apache.flink.streaming.runtime.tasks.StreamTask.beforeInvoke(StreamTask.java:525) at org.apache.flink.streaming.runtime.tasks.StreamTask.invoke(StreamTask.java:565) at org.apache.flink.runtime.taskmanager.Task.doRun(Task.java:755) at org.apache.flink.runtime.taskmanager.Task.run(Task.java:570) at java.base/java.lang.Thread.run(Thread.java:834) notes: - the problem occurs when using two different savepoints: - (thanos) `rdf-streaming-updater-codfw/commons/savepoints/deploy_0_3_103/savepoint-c4b021-9c4cd6541ec7/` - (thanos) `rdf-streaming-updater-codfw/commons/savepoints/savepoint-c4b021-818dd669a47e/` - the exception is only visible on a taskmanager POD running on `kubernetes2003` (logs <https://logstash.wikimedia.org/app/discover#/?_g=(filters:!(),query:(language:lucene,query:kubernetes2003),refreshInterval:(pause:!t,value:0),time:(from:'2022-02-22T16:00:00.000Z',to:'2022-02-22T19:30:00.000Z'))&_a=(columns:!(log,host,message,kubernetes.pod_id),filters:!(),index:'logstash-*',interval:auto,query:(language:lucene,query:'kubernetes.master_url:%22https:%2F%2Fkubemaster.svc.codfw.wmnet%22%20AND%20kubernetes.namespace_name:%22rdf-streaming-updater%22%20AND%20java.io.EOFException%20AND%20kubernetes.labels.component:taskmanager'),sort:!())>) - the problem occurs when loading the savepoints using the previous version 0.3.99 - the deploy worked fine on staging for both wdqs and wcqs, fine as well on wcqs at codfw - the system was able to resume using 0.3.99 and a previous checkpoint `rdf-streaming-updater-codfw/wikidata/checkpoints/e245dd1e76d56d9ded351b27cf2d4c2a/chk-415014`. AC: - understand the root cause of the failure TASK DETAIL https://phabricator.wikimedia.org/T302396 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: Aklapper, dcausse, MPhamWMF, CBogen, Namenlos314, Gq86, Lucas_Werkmeister_WMDE, EBjune, merbst, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles
_______________________________________________ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org