dcausse added a comment.

  In T244590#5893018 <https://phabricator.wikimedia.org/T244590#5893018>, 
@Ottomata wrote:
  
  > COOL! :)
  >
  >> it's important to note that the state of step 3 is tightly coupled with 
its dump and thus we will have to instantiate a new stream per imported dump. 
In other words a wdqs system imported using dump Y will have to consume the RDF 
stream generated from an initial state based on this same dump. This means that 
the RDF stream will be named against a particular dump instance.
  >
  > Hm.  Would it be possible instead to lambda architecture this part?  
Instead of having to reload from a full dump and then recreate a new stream, 
could accomplish the same cleanups by backfilling from a batch job in Hadoop?  
I'm not sure I fully understand the 'cleanups' here.  Are they not do-able with 
the stream because events representing some of the state changes don't exist 
(yet)?
  
  I hope that in the future once the stream has been stabilized yes reloading 
the system might become less necessary and that a fresh and consistent dump can 
be reconstructed (daily?) using the stream itself.
  Reloading from the dump generated by MW is something we need anyways in order 
to bootstrap the system and at the beginning will be needed to circumvent:
  
  - bug fixes (bug where the data is simply lost)
  - lost events (undetected failures or bugs in MW)
  - cleanup
  
  The cleanup operation mentioned here is a sort of "garbage collection", to 
simplify we need to detect unused resources (subgraph) in the graph, the stream 
itself does not know this unless we keep another large state doing references 
counting.
  The solution proposed here is to simply spawn a new system from time to time 
(the dump generated by MW is clean) so that we do cleanup and fix lost events 
at the same time, but I agree with you this is not ideal and leveraging more 
batch jobs and/or more states in the stream will help minimize the need to do a 
full reload.

TASK DETAIL
  https://phabricator.wikimedia.org/T244590

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: Ottomata, JAllemandou, Aklapper, Zbyszko, Gehel, dcausse, darthmon_wmde, 
Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, 
EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, 
jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Dinoguy1000, 
Manybubbles, Mbch331
_______________________________________________
Wikidata-bugs mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs

Reply via email to