[Wikidata-bugs] [Maniphest] [Updated] T189747: Toolforge node for constraint reports updating bot

2019-03-24 Thread GTirloni
GTirloni added a project: cloud-services-team (Kanban).

TASK DETAIL
  https://phabricator.wikimedia.org/T189747

EMAIL PREFERENCES
  https://phabricator.wikimedia.org/settings/panel/emailpreferences/

To: GTirloni
Cc: Ayack, Edgars2007, Lucas_Werkmeister_WMDE, bd808, zhuyifei1999, Multichill, 
Lydia_Pintscher, #wikibase-quality-constraints, Ivan_A_Krestinin, Aklapper, 
alaa_wmde, Nandana, AndyTan, sietec, Zylc, Bstorm, 1978Gage2001, Lahi, 
aborrero, Gq86, GoranSMilovanovic, Chicocvenancio, Allthingsgo, QZanden, 
Tbscho, Freddy2001, LawExplorer, JJMC89, _jensen, rosalieper, Agabi10, 
srodlund, Luke081515, Wikidata-bugs, aude, Gryllida, jayvdb, scfc, coren, 
Mbch331, Krenair, chasemp
___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T189747: Toolforge node for constraint reports updating bot

2018-03-17 Thread Ivan_A_Krestinin
Ivan_A_Krestinin added a comment.
Bot parses full Wikidata dump (844 GB of XML files) and load items and its properties into memory. This in memory data is used for reports generation. Now Wikidata has ~4700 items. So my code uses ~144 bytes per item. I can not load only part of data because dumps parsing is long and sequential process (~5 hours, 4 threads are used).

Another approach is restoring dumps to SQL DB or some similar engine and generating reports using requests to the DB. On Toolforge bot can access to some existing database as I know. So dumps restoring step can be skipped. But it is hard for me to predict result performance of such approach. For example DB engines usually has bad support of regular expressions. So I need to select all property values for Format constraint check. It is ~1.5 GB of data for property P2093. This is single property, but the most properties has constraints. Single bot run will touch 90% of all information stored in the database. So database will be aggressively used by bot during long time every day.TASK DETAILhttps://phabricator.wikimedia.org/T189747EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: Ivan_A_KrestininCc: Lucas_Werkmeister_WMDE, bd808, zhuyifei1999, Multichill, Lydia_Pintscher, Wikibase-Quality-Constraints, Ivan_A_Krestinin, Aklapper, 1978Gage2001, Lahi, aborrero, Gq86, GoranSMilovanovic, Chicocvenancio, QZanden, Tbscho, Freddy2001, LawExplorer, JJMC89, Agabi10, srodlund, Luke081515, Wikidata-bugs, aude, Gryllida, jayvdb, scfc, coren, Mbch331, Krenair, chasemp___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs


[Wikidata-bugs] [Maniphest] [Updated] T189747: Toolforge node for constraint reports updating bot

2018-03-14 Thread Multichill
Multichill added a project: Toolforge.
TASK DETAILhttps://phabricator.wikimedia.org/T189747EMAIL PREFERENCEShttps://phabricator.wikimedia.org/settings/panel/emailpreferences/To: MultichillCc: Multichill, Lydia_Pintscher, Wikibase-Quality-Constraints, Ivan_A_Krestinin, Aklapper, 1978Gage2001, Lahi, aborrero, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Chicocvenancio, QZanden, Tbscho, Freddy2001, LawExplorer, JJMC89, Agabi10, srodlund, Luke081515, Wikidata-bugs, aude, Gryllida, jayvdb, scfc, coren, Mbch331, Krenair, chasemp___
Wikidata-bugs mailing list
Wikidata-bugs@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs