[Wikidata-bugs] [Maniphest] [Commented On] T238002: WDQS Munger should be multi threaded

2019-12-12 Thread dcausse
dcausse added a comment.


  Separation of
  
  - parsing
  - munging
  - writing
  
  in multiple thread doubled the speed of the munger
  
old: 
real1371m34.618s
user1854m48.672s
sys 24m44.480s

new:
real731m20.495s
user1798m42.176s
sys 30m7.888s
  
  I should have linked 
https://gerrit.wikimedia.org/r/c/wikidata/query/rdf/+/553758 to this task.
  Since the rdf parser is the limiting factor I think we will have to do the 
entity delimitation without a rdf parser if we want to further improve the 
speed of this step.
  We could also consider switching to the `nt` format which I'm sure will be a 
lot faster to parse if the size overhead is acceptable.

TASK DETAIL
  https://phabricator.wikimedia.org/T238002

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: dcausse
Cc: dcausse, Smalyshev, Gehel, Aklapper, darthmon_wmde, DannyS712, Nandana, 
Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Commented On] T238002: WDQS Munger should be multi threaded

2019-11-11 Thread Smalyshev
Smalyshev added a comment.


  Per-item data are mostly independent, so different items can be easily 
processable in parallel, however that would require splitting the incoming data 
per item (note that item data not necessarily have item URI as subject - there 
are statements, references, values, sitelinks, etc.)

TASK DETAIL
  https://phabricator.wikimedia.org/T238002

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: Smalyshev
Cc: Smalyshev, Gehel, Aklapper, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86, 
Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, 
LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, 
Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs