dcausse moved this task from In Progress to Needs Reporting on the 
Discovery-Search (Current work) board.
dcausse closed this task as "Declined".
dcausse added a comment.


  Analyzed the large _metadata file and it has 3 operators with very large 
states esp. `max-part-counter` owned by StreamingFileSink, this state is 
cleared when calling 
`org.apache.flink.streaming.api.checkpoint.CheckpointedFunction#snapshotState` 
which is triggered on a SinkFunction only when marked with such interface. The 
sole file sinks we used were the side outputs when we stored them in hdfs. 
Prior https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/695295 we wrapped 
the file sink and & kafka sink with a generic SinkFunction to unify the 
serialization this caused code relying on `instanceof CheckpointedFunction` to 
not work properly. It is very likely that this broken `_metadata` was generated 
because of this.
  
  I propose to decline this and cleanup the code to remove all the file sinks 
since we do not them anymore (even if I think they no longer cause issues).

TASK DETAIL
  https://phabricator.wikimedia.org/T286890

WORKBOARD
  https://phabricator.wikimedia.org/project/board/1227/

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Aklapper, Suran38, Biggs657, Invadibot, Lalamarie69, MPhamWMF, 
maantietaja, Juan90264, Alter-paule, Beast1978, CBogen, Un1tY, Akuckartz, 
Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, 
Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, 
GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, 
_jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Wikidata-bugs, 
Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list -- [email protected]
To unsubscribe send an email to [email protected]

Reply via email to