dcausse moved this task from In Progress to Needs Reporting on the Discovery-Search (Current work) board. dcausse closed this task as "Declined". dcausse added a comment.
Analyzed the large _metadata file and it has 3 operators with very large states esp. `max-part-counter` owned by StreamingFileSink, this state is cleared when calling `org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#snapshotState` which is triggered on a SinkFunction only when marked with such interface. The sole file sinks we used were the side outputs when we stored them in hdfs. Prior https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/695295 we wrapped the file sink and & kafka sink with a generic SinkFunction to unify the serialization this caused code relying on `instanceof CheckpointedFunction` to not work properly. It is very likely that this broken `_metadata` was generated because of this. I propose to decline this and cleanup the code to remove all the file sinks since we do not them anymore (even if I think they no longer cause issues). TASK DETAIL https://phabricator.wikimedia.org/T286890 WORKBOARD https://phabricator.wikimedia.org/project/board/1227/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: dcausse Cc: dcausse, Aklapper, Suran38, Biggs657, Invadibot, Lalamarie69, MPhamWMF, maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________ Wikidata-bugs mailing list -- [email protected] To unsubscribe send an email to [email protected]
