[Wikidata-bugs] [Maniphest] T281267: various weekly and daily dumps run from systemd timers are broken
ArielGlenn added a comment. @fgiunchedi I notice that in some cases phab tasks are autocreated when systemd units fail. Is that true for systemd jobs on snapshot hosts? Could we get tagged on those (Dumps-Generation) or could we get emails from those (ops-dumps@wm.o)? TASK DETAIL https://phabricator.wikimedia.org/T281267 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Addshore, Tonina_Zhelyazkova_WMDE, WMDE-leszek, JAllemandou, fgiunchedi, jbond, hoo, dcausse, ArielGlenn, Protsack.stephan, Busfault, Astuthiodit_1, Atieno, karapayneWMDE, joanna_borun, Invadibot, Devnull, maantietaja, lmata, Muchiri124, jannee_e, ItamarWMDE, Akuckartz, holger.knust, Legado_Shulgin, ReaperDawn, Nandana, Davinaclare77, Techguru.pc, Lahi, Gq86, herron, GoranSMilovanovic, Chicocvenancio, Lunewa, Hfbn0, QZanden, LawExplorer, Zppix, Volans, _jensen, rosalieper, Scott_WUaS, Wong128hk, gnosygnu, Wikidata-bugs, aude, faidon, Mbch331, Jay8g, Hokwelum ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T68108: [Epic] Store media information for files on Wikimedia Commons as structured data
ArielGlenn closed subtask T226093: Capacity planning for Commons Structured Data as "Resolved". TASK DETAIL https://phabricator.wikimedia.org/T68108 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mholloway, Ladsgroup, MarkTraceur, WMDE-leszek, jcrespo, Marostegui, AfroThundr3007730, Stashbot, _jensen, SandraF_WMF, Ramsey-WMF, CCicalese_WMF, PokestarFan, Saerdnaer, Juandev, Wesalius, Zppix, NMaia, Mattias_Ostmar-WMSE, Sadads, Poyekhali, -jem-, Deskana, Tfinc, Smalyshev, Jheald, LikeLifer, Yann, intracer, Spinster, Orofarne, Filceolaire, MZMcBride, bzimport, TheDJ, zhuyifei1999, DixonD, Bugreporter, RP88, Aklapper, Matanya, waldyrious, El_Grafo, Daniel_Mietchen, Jdforrester-WMF, GPHemsley, Bene, Legoktm, Nemo_bis, Lokal_Profil, Tobi_WMDE_SW, He7d3r, Petrb, jayvdb, Kelson, Steinsplitter, JeroenDeDauw, iecetcwcpggwqpgciazwvzpfjpwomjxn, revi, JanZerebecki, JeanFred, Ricordisamoa, Snowolf, Keegan, Rillke, Bawolff, Fabrice_Florin, Multichill, Liuxinyu970226, Ainali, Tgr, Lydia_Pintscher, jeremyb, Stryn, Ltrlg, daniel, Dereckson, JohnLewis, Udehb-WMF, Astuthiodit_1, BeautifulBold, Suran38, karapayneWMDE, Invadibot, GFontenelle_WMF, maantietaja, Y.ssk, FRomeo_WMF, Zblace, Peteosx1x, Muchiri124, NavinRizwi, CBogen, ItamarWMDE, Nintendofan885, Akuckartz, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Cparle, GoranSMilovanovic, QZanden, Tramullas, Acer, V4switch, LawExplorer, Salgo60, Silverfish, rosalieper, Taiwania_Justo, Scott_WUaS, Susannaanas, Ixocactus, Wong128hk, Fuzheado, Jane023, Wikidata-bugs, Base, matthiasmullie, aude, Dinoguy1000, Raymond, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. There's no point in having this open for a once a year check in, so I'll go ahead and close it. When capacity planning needs to be done for dbs in the regular course of things, this can be discussed. TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Nintendofan885, Ladsgroup, Abit, matthiasmullie, Marostegui, Addshore, Ramsey-WMF, jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, Busfault, Astuthiodit_1, Atieno, karapayneWMDE, Invadibot, GFontenelle_WMF, maantietaja, FRomeo_WMF, jannee_e, CBogen, ItamarWMDE, Akuckartz, holger.knust, Nandana, JKSTNK, Lahi, Gq86, E1presidente, Cparle, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, gnosygnu, Fuzheado, Jane023, Wikidata-bugs, Base, aude, Daniel_Mietchen, Ricordisamoa, Wesalius, Lydia_Pintscher, Raymond, Steinsplitter, Mbch331, Hokwelum ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data
ArielGlenn added a comment. In T226093#8512308 <https://phabricator.wikimedia.org/T226093#8512308>, @LSobanski wrote: > The task's original intent was to cover planning "over the next 3 years" starting in 2019. @ArielGlenn is the task still relevant, can it be closed, do we need a new one? It depends on whether any tables are expected to grow a fair amount in the next three years. @Ladsgroup will have a better handle on that now. TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Nintendofan885, Ladsgroup, Abit, matthiasmullie, Marostegui, Addshore, Ramsey-WMF, jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, Busfault, Astuthiodit_1, Atieno, karapayneWMDE, joanna_borun, Invadibot, GFontenelle_WMF, Devnull, maantietaja, FRomeo_WMF, Muchiri124, jannee_e, CBogen, ItamarWMDE, Akuckartz, holger.knust, Legado_Shulgin, ReaperDawn, Nandana, JKSTNK, Davinaclare77, Techguru.pc, Lahi, Gq86, E1presidente, Cparle, SandraF_WMF, GoranSMilovanovic, Lunewa, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, gnosygnu, Fuzheado, Jane023, Wikidata-bugs, Base, aude, Daniel_Mietchen, Ricordisamoa, Wesalius, Lydia_Pintscher, Raymond, faidon, Steinsplitter, Mbch331, Jay8g, fgiunchedi, Hokwelum ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. In T138208#7844298 <https://phabricator.wikimedia.org/T138208#7844298>, @Ladsgroup wrote: > It's a bit hard to measure but it's probably fixed. That would be wonderful if true. Let's leave this open for a while yet just in case... TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Kormat, LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Busfault, Astuthiodit_1, karapayneWMDE, Invadibot, maantietaja, jannee_e, ItamarWMDE, Akuckartz, holger.knust, RhinosF1, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331, Hokwelum ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117
ArielGlenn added a comment. Hey jsut a note that we saw another failure: Output of systemd timer for '/usr/local/bin/dumpwikibaserdf.sh -p wikidata -d truthy -f nt' SYSTEMDTIMER noreply@snapshot1008.eqiad.wmnet via wikimedia.org ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 104 Failed. Couldn't get MAX(page_id) from db. Not sure who can/should undertake to make the script more resilient but there it is. TASK DETAIL https://phabricator.wikimedia.org/T300240 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, karapayneWMDE, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. I am aware of and following this discussion but right now, my responsiveness on this task will be slow, most of my time needs to go to getting my teammate who will be dumps co-maintainer up to speed. Please bear with us. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, karapayneWMDE, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117
ArielGlenn added a comment. Hm I wonder who we should add that would take on restarting these jobs if they deem it useful. Uh. Deferring for now since I have no bright ideas, and noting that here. Thanks again! TASK DETAIL https://phabricator.wikimedia.org/T300240 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117
ArielGlenn added a comment. Uh @dcausse Do you want to add someone to the ops-dumps alias so that you can be informed in these instances and perhaps schedule a restart of the job(s)? It would be easy enough. Sorry to ask after the task is closed! TASK DETAIL https://phabricator.wikimedia.org/T300240 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117
ArielGlenn added a comment. I saw an error from the cron job, it was sent to ops-dumps, which someone from WMDE should be on as well I think. The error looked to me like it had to do with a db server being depooled or otherwise unavailable: ERROR 2013 (HY000): Lost connection to MySQL server at 'reading authorization packet', system error: 104 So, transient indeed. Feel free to close if this info is sufficient followup for you, @dcausse :-) TASK DETAIL https://phabricator.wikimedia.org/T300240 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, JAllemandou, AKhatun_WMF, dcausse, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. Thanks. I was pretty careful with my testing for the last fix, making sure that in production the patch redirected to a vslow/dump server. But I may have overlooked something. :-( TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. I hate to ask but can we capture any queries? TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots
Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T238972 WORKBOARD https://phabricator.wikimedia.org/project/board/1519/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Christian75, Schnark, binbot, Johan, Lucas_Werkmeister_WMDE, RhinosF1, Benjavalero, hoo, leila, ArielGlenn, Invadibot, R4356th, Bebiezaza, EhsanKhandowa, maantietaja, jannee_e, Akuckartz, PatsagornY, holger.knust, Viztor, Nandana, Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, JJMC89, _jensen, rosalieper, Scott_WUaS, Luke081515, gnosygnu, Wikidata-bugs, aude, TheDJ, Addshore, Mbch331, Jay8g, valerio.bozzolan ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. The above patch was deployed with the train everywhere, so the specific set of queries should no longer be directed to non-vslow/dump db servers. If that's the cas, we are now back to the harder issue of what to do when a db server is depooled, and I think that discussion is happening elsewhere. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T297470: torrent file for Wikidata dumps
ArielGlenn closed this task as "Declined". ArielGlenn added a comment. I'm goin to go ahead and close this as declined. Feel to re-open if things change in the future. TASK DETAIL https://phabricator.wikimedia.org/T297470 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Andrawaag, Invadibot, maantietaja, jannee_e, Biaoo, Philoserf, Nintendofan885, Akuckartz, Ironie, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, mys_721tx, Wikidata-bugs, Hydriz, aude, Nemo_bis, Addshore, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. The patch at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/747455/ is tested and ready to go, and in line with the way existing dumps scripts work. So I'd like to go ahead with it. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, 786, Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. There is a complicated set of python scripts that coordinate the dump jobs for each wiki during the two monthly runs. https://wikitech.wikimedia.org/wiki/Dumps/Current_Architecture gives an overview. https://www.mediawiki.org/wiki/SQL/XML_Dumps#Becoming_a_dumps_co-maintainer gives rather a lot more. In general for testing you will run the python worker.py script, supplying it with the config file, the job name, the run date and the wiki; we test in deployment-prep, although I am working on a docker container testbed. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, 786, Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. In T138208#7611718 <https://phabricator.wikimedia.org/T138208#7611718>, @Ladsgroup wrote: > In T138208#7611712 <https://phabricator.wikimedia.org/T138208#7611712>, @ArielGlenn wrote: > >> Not yet; I need to talk with someone more knowledgeable than me about whether this approach is reasonable, before moving forward. I'll bring it up at our next meeting (tomorrow). > > Can I know how dumpers work? Any link to documentation would be appreciated. I need it to understand this patch and also finding a way for T298485 <https://phabricator.wikimedia.org/T298485> I don't know of any documentation specifically for the MW maintenance scripts for dumps or the modules used for import/export. There are genreal Manual pages for importing and exporting (maintained by volunteers I think) but I don't think they have the level of detail you are looking for. I have plenty of documentation for the python scripts, the formats, the content, and the various servers and how they are set up. But I guess that won't be so helpful here. Should we meet? Should I try to write something? If so, how in depth does it need to be? TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, 786, Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. In T138208#7611708 <https://phabricator.wikimedia.org/T138208#7611708>, @Marostegui wrote: > In T138208#7571559 <https://phabricator.wikimedia.org/T138208#7571559>, @gerritbot wrote: > >> Change 747455 had a related patch set uploaded (by ArielGlenn; author: ArielGlenn): >> >> [mediawiki/core@master] try to use 'dump' group for db connections for dumps of page content >> >> https://gerrit.wikimedia.org/r/747455 > > Any ETA on when this will be merged? Thanks! Not yet; I need to talk with someone more knowledgeable than me about whether this approach is reasonable, before moving forward. I'll bring it up at our next meeting (tomorrow). TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, 786, Suran38, Biggs657, Invadibot, Lalamarie69, maantietaja, Juan90264, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, holger.knust, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T222349: Do not rate limit dumps from internal network
ArielGlenn added a comment. Note that the checksum files for those dumps are available for download as well, since they are provided along with the main dump output files to all mirrors. Someone from WMCS will probably need to look at this (again) if the discussion is being re-opened. They should have insight into the impact on existing services from any change. TASK DETAIL https://phabricator.wikimedia.org/T222349 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Gehel, ArielGlenn Cc: Volans, ayounsi, cmooney, EBernhardson, Bstorm, ArielGlenn, Gehel, Aklapper, joanna_borun, Ramtin2021, Invadibot, MPhamWMF, dcaro, Devnull, Slst2020, GeminiAgaloos, maantietaja, nskaggs, lmata, Muchiri124, Raymond_Ndibe, CBogen, Nintendofan885, Akuckartz, Phamhi, RhinosF1, Legado_Shulgin, ReaperDawn, Nandana, Namenlos314, skpuneethumar, sietec, Zylc, Giuliamocci, Davinaclare77, 1978Gage2001, Techguru.pc, Lahi, Operator873, Gq86, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Chicocvenancio, Allthingsgo, Hfbn0, QZanden, EBjune, Tbscho, merbst, LawExplorer, Zppix, JJMC89, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Wong128hk, mys_721tx, jkroll, Wikidata-bugs, Jdouglas, Jitrixis, aude, Tobias1984, Manybubbles, Gryllida, faidon, scfc, Addshore, Mbch331, Jay8g, bd808, Krenair, fgiunchedi ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. Thanks for this thought, Daniel. I think it's better if I can pass the dbgroupdefault parameter to the maintenance script itself, instead of hacking something into getBlob(). But I do need to check if that's going to work ok. The longer term fix you mentioned, is there a task for that, so I can follow along? TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. As I feared, fetchText.php calls MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db replica connection on its own, with no opportunity for us to ask that it be in the vslow/dump group. We might be able to use the -dbgroupdefault dump option to this script; I will have to do some testing to see if that has any effect and what happens when that group is suddenly not available. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. The above is happening from pages-meta-history dumps, and I will look into it later today. The snapshot1008 (wikidata entity) dumps will be harder. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. The reason only those two snapshot hosts are involved is undoubtedly because dumps on the others have finished for this run. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, Invadibot, maantietaja, jannee_e, Akuckartz, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T297470: torrent file for Wikidata dumps
ArielGlenn added a comment. We don't provide torrent files from here because this is something that can be done by members of the community. I would get in touch with one of the people maintaining any of the torrents listed here: https://meta.wikimedia.org/wiki/Data_dump_torrents and see if they are willing to add Wikidata to the list. There also used to be a toolforge project for torrents, https://admin.toolforge.org/tool/dump-torrents but I'm not sure if it is still running. Finally, if the speed of the download is the main issue for you, you might try one of the mirror sites for downloading, https://meta.wikimedia.org/wiki/Mirroring_Wikimedia_project_XML_dumps#Current_mirrors and see if you get faster downloads that way. TASK DETAIL https://phabricator.wikimedia.org/T297470 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Andrawaag, Invadibot, maantietaja, jannee_e, Biaoo, Philoserf, Nintendofan885, Akuckartz, Ironie, holger.knust, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, mys_721tx, Wikidata-bugs, Hydriz, aude, Nemo_bis, Addshore, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd
ArielGlenn added a comment. In T222985#7164049 <https://phabricator.wikimedia.org/T222985#7164049>, @Mitar wrote: > Are you saying that existing wikidata json dumps can be decompressed in parallel if using lbzip2, but not pbzip2? lbzip2 is format-compatible with bzip2 and can read bzip2 or lbzip2 compressed files and use multiple cores to decompress, indeed. pbzip2 should also work forr that matter. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd
ArielGlenn added a comment. lbzip2 decompresses in parallel as well. We use that for compression of the SQL/XML dumps. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mitar, ImreSamu, hoo, Smalyshev, ArielGlenn, Liuxinyu970226, bennofs, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T281267: various weekly and daily dumps run from systemd timers are broken
ArielGlenn added a comment. What are the next steps on this? Should I be tweaking a manifest someplace? TASK DETAIL https://phabricator.wikimedia.org/T281267 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jbond, ArielGlenn Cc: Addshore, Tonina_Zhelyazkova_WMDE, WMDE-leszek, JAllemandou, fgiunchedi, jbond, hoo, dcausse, ArielGlenn, Protsack.stephan, Invadibot, Ramtin0071, Devnull, maantietaja, lmata, Muchiri124, jannee_e, Akuckartz, RhinosF1, Legado_Shulgin, ReaperDawn, Nandana, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, herron, GoranSMilovanovic, Chicocvenancio, Lunewa, Th3d3v1ls, Hfbn0, QZanden, LawExplorer, Zppix, Volans, _jensen, rosalieper, Scott_WUaS, Wong128hk, gnosygnu, Wikidata-bugs, aude, faidon, Mbch331, Jay8g ___ Wikidata-bugs mailing list -- wikidata-bugs@lists.wikimedia.org To unsubscribe send an email to wikidata-bugs-le...@lists.wikimedia.org
[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump
ArielGlenn added a subscriber: hoo. ArielGlenn added a comment. I am proactively adding @hoo as he can provide some insight and perhaps tag others as well. TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, Sascha, Mitar, ArielGlenn, Smalyshev, Addshore, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata
ArielGlenn added a comment. In T279518#6981710 <https://phabricator.wikimedia.org/T279518#6981710>, @hoo wrote: >> Icinga sends alerts, and those would come to me I guess, which is probably not the best outcome :-) > > We could use the `wikidata` contact group for that. > >> Note that mails for other jobs go to an email alias that includes several people on my team; perhaps you can rope a couple others in WMDE or who work on Wikidata to sign onto a new alias? > > We already have a `wikidata-monitoring` alias we use for these Icinga alerts, I guess we could nicely use it for this as well. > > So, I guess both would be fine... while cron is probably easier to wire up, Icinga seems more fitting (we don't care about this as long as it succeeds). Your cron job would only produce output on failure if you set it up appropriately, so both are fine indeed. TASK DETAIL https://phabricator.wikimedia.org/T279518 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lydia_Pintscher, ArielGlenn, Aklapper, hoo, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Addshore, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata
ArielGlenn added a comment. Icinga sends alerts, and those would come to me I guess, which is probably not the best outcome :-) I believe that we use MAILTO for everything in the dumpsgen crontab, but the question is whether there's a nice alias to send emails to, or whether we want to make you in particular the SPOF for this. I imagine you can figure out my opinion on this already :-) Note that mails for other jobs go to an email alias that includes several people on my team; perhaps you can rope a couple others in WMDE or who work on Wikidata to sign onto a new alias? TASK DETAIL https://phabricator.wikimedia.org/T279518 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lydia_Pintscher, ArielGlenn, Aklapper, hoo, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Addshore, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata
ArielGlenn added a project: Dumps-Generation. Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T279518 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lydia_Pintscher, ArielGlenn, Aklapper, hoo, Invadibot, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Svick, Addshore, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T277300: Lexeme JSON dumps contain invalid JSON
ArielGlenn added a comment. This is now deployd and will be in effect for next week's lexeme run. TASK DETAIL https://phabricator.wikimedia.org/T277300 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, hoo, Lydia_Pintscher, Invadibot, maantietaja, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, Kent7301, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, Mahir256, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Addshore, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps
ArielGlenn added a project: Dumps-Generation. Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T278031 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mitar, Aklapper, Invadibot, maantietaja, jannee_e, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Lydia_Pintscher, Addshore, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format
ArielGlenn closed this task as "Resolved". ArielGlenn added a comment. Since @hoo validated the dump from the past week, verifiying that the current dump generation process is fixed, we can now close this task. Thanks everyone! TASK DETAIL https://phabricator.wikimedia.org/T276643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, Tacsipacsi, Cparle, Palotabarat, LucasWerkmeister, Motagirl2, Addshore, Mahir256, ArielGlenn, Ash20001, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, gnosygnu, abian, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format
ArielGlenn added a comment. I'll leave this open until the run is complete and folks have had time to try to use them, so probably through the coming weekend. TASK DETAIL https://phabricator.wikimedia.org/T276643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LucasWerkmeister, Motagirl2, Addshore, Mahir256, ArielGlenn, Ash20001, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, gnosygnu, abian, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format
ArielGlenn added a comment. In T276643#6890308 <https://phabricator.wikimedia.org/T276643#6890308>, @Ash20001 wrote: > Will this patch be included in the next dump or can be put back in the last two dumps (regenerate dump) This should be in time for the dump that will be produced this week. For the previous two weeks you'll need to filter the contents to add in commas, as mentioned by Lucas in his earlier comment. TASK DETAIL https://phabricator.wikimedia.org/T276643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LucasWerkmeister, Motagirl2, Addshore, Mahir256, ArielGlenn, Ash20001, maantietaja, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, gnosygnu, abian, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme
ArielGlenn added a comment. These look fine to me from today, and I've done all the buster-side testing so that's ok too. Closing this! Ah, do we want to anounce it anywhere though? Maybe I won't close it pending that answer. Places it could be announced: xmldatadumps-l, wikitech-l, research list, wikidata list. TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, noarave, ArielGlenn, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Pablo-WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Iflorez, Kent7301, alaa_wmde, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme
ArielGlenn added a comment. I am doing some prep work before I try to test this on buster. Getting close! TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: noarave, ArielGlenn, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Pablo-WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Iflorez, Kent7301, alaa_wmde, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. mysql.php, used for wikidata entity dumps, does not apparently correctly handle the --group flag. it's unclear to me what it does do, I need to check into this sometime later. The queries run by it are extremely short so the impact is minimal, but it still needs to be checked. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. In T138208#6811418 <https://phabricator.wikimedia.org/T138208#6811418>, @Addshore wrote: > In T138208#6809784 <https://phabricator.wikimedia.org/T138208#6809784>, @ArielGlenn wrote: > >> This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then and concat them together. The other thing I should do is check how often we respawn fetchText; that is something I might be able to change sooner rather than later. > > From the sounds of things I can leave this ticket on your plate then @ArielGlenn ? :) Sadly, yes :-P TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T147169: Make sure Wikibase dump maintenance scripts solely use the "dump" db group
ArielGlenn added a comment. These are for the weekly wikidata "entity dumps", and so separate from the main xml/sql dumps implicated in the other task. TASK DETAIL https://phabricator.wikimedia.org/T147169 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: Marostegui, gerritbot, Lucas_Werkmeister_WMDE, ArielGlenn, Addshore, aaron, Aklapper, Lydia_Pintscher, jcrespo, aude, daniel, hoo, Akuckartz, Iflorez, alaa_wmde, Nandana, lucamauri, Lahi, Gq86, GoranSMilovanovic, lisong, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium
ArielGlenn added a comment. This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then and concat them together. The other thing I should do is check how often we respawn fetchText; that is something I might be able to change sooner rather than later. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Marostegui, Addshore, Lydia_Pintscher, daniel, hoo, ArielGlenn, jcrespo, Zppix, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme
ArielGlenn added a comment. All set. We should check on these again in the middle of next week, as the run starts on Monday at ridiculous-o-clock when we are all sleeping. TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Pablo-WMDE, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Iflorez, Kent7301, alaa_wmde, joker88john, CucyNoiD, Nandana, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme
ArielGlenn added a comment. In T264883#6786811 <https://phabricator.wikimedia.org/T264883#6786811>, @Lucas_Werkmeister_WMDE wrote: > Are you sure they ran? That directory only contains RDF dumps as far as I can tell (Turtle and NTriples), we’ve been generating those for a while (compare 20210122 <https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20210122/> with 20201218 <https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20201218/>). I haven’t found any lexeme JSON dumps yet. Ah crap. Yeah I see that now. I didn't get any failure emails about it, but when I looked in the log I saw this: root@snapshot1008:~# more /var/log/wikidatadump/dumpwikidatajson-wikidata-20210127-lexemes-main.log File size for shard 0 is only 26086402. Aborting. I guess those values need to be adjusted for lexemes. TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Pablo-WMDE, Akuckartz, Iflorez, alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme
ArielGlenn added a comment. These ran and are available at https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20210122/ How do they look? TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Lydia_Pintscher, WMDE-leszek, Pablo-WMDE, Akuckartz, Iflorez, alaa_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, Bodhisattwa, Scott_WUaS, Jonas, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T221504: investigate why content history dump of certain wikidata page ranges is so slow
ArielGlenn added a comment. Following up on this, has there been any more discussion about making the JSON a little less wordy/disk-filly? I don't see any other path forward on this in the short to medium term. TASK DETAIL https://phabricator.wikimedia.org/T221504 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Addshore, Smalyshev, Gehel, Mahir256, ArielGlenn, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T246415: Investigate a different db load groups for wikidata / wikibase
ArielGlenn added a project: User-ArielGlenn. TASK DETAIL https://phabricator.wikimedia.org/T246415 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michael, ArielGlenn Cc: ArielGlenn, Michael, Marostegui, Ladsgroup, WMDE-leszek, Aklapper, Addshore, Alter-paule, Beast1978, Un1tY, Akuckartz, Hook696, Iflorez, Kent7301, alaa_wmde, joker88john, CucyNoiD, Nandana, jijiki, Klaas_Z4us_V, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Pablo-WMDE, GoranSMilovanovic, QZanden, LawExplorer, Lewizho99, Maathavan, elukey, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Lydia_Pintscher, Mbch331, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed
ArielGlenn added a comment. All of those tables are there: see https://gerrit.wikimedia.org/r/c/operations/puppet/+/527505 and current https://github.com/wikimedia/puppet/blob/production/modules/snapshot/files/dumps/table_jobs.yaml#L142 Is there anything else needed, @Lucas_Werkmeister_WMDE ? TASK DETAIL https://phabricator.wikimedia.org/T264298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Addshore, toan, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed
ArielGlenn added a comment. In T264298#6511634 <https://phabricator.wikimedia.org/T264298#6511634>, @Lucas_Werkmeister_WMDE wrote: > We also realized that the `tablejobs.yaml` file didn’t mention the new tables (the replacement for `wb_terms`: `wbt_{item,property}_terms`, `wbt_{term,text}_in_lang`, `wbt_text`, `wbt_type`). If `wb_terms` was worth dumping, then presumably the new tables should be dumped too. Is it enough to add them to the YAML file or do you need some extra setup for new tables? Woops I missed this comment entirely. Ummm. Let me have a look at that and if there are changes needed, I'll push them out TODAY. Otherwise is there anything else needed for this task, now that the mw-vagrant patch is merged? TASK DETAIL https://phabricator.wikimedia.org/T264298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Addshore, toan, jannee_e, Akuckartz, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding
ArielGlenn removed projects: Wikidata, Wikidata-Query-Service, Analytics. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc: Lucas_Werkmeister_WMDE, ArielGlenn, Milimetric, Aklapper, marcmiquel, Strainu, jannee_e, Lunewa, gnosygnu, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding
ArielGlenn added a comment. In T264850#6531377 <https://phabricator.wikimedia.org/T264850#6531377>, @Milimetric wrote: > @ArielGlenn is this something you'd know about or know who to point me to? I think the wdqs folks are going to be your best bet, I've added the project. Looks like a simple text encoding error, but I'd like to know exactly what tools were used to display the text before saying that for sure. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc: ArielGlenn, Milimetric, Aklapper, marcmiquel, Strainu, jannee_e, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, JAllemandou, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding
ArielGlenn added a comment. echo -n ânești | od -t x1 000 c3 a2 6e 65 c8 99 74 69 You appear to be seeing a string representation of the non-ascii characters as hex bytes, i.e. xc3 xa2 ne xc8 x99 ti. What command are you using to display the test in the file, and on what platform? TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc: ArielGlenn, Milimetric, Aklapper, marcmiquel, Strainu, jannee_e, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, JAllemandou, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding
ArielGlenn added projects: Wikidata-Query-Service, Dumps-Generation. Restricted Application added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc: ArielGlenn, Milimetric, Aklapper, marcmiquel, Strainu, jannee_e, CBogen, Akuckartz, 4748kitoko, darthmon_wmde, Nandana, Namenlos314, Akovalyov, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, JAllemandou, terrrydactyl, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264164: Cleanup broken dumps in /wikidatawiki/entities/20200921/
ArielGlenn added a comment. They are indeed gone from dumpsdata1002; we keep fewer back issues there, since we're not serving them anywhere but only rsyncing them off. We keep the last 3 wikibase dumps, see https://github.com/wikimedia/puppet/blob/production/modules/dumps/manifests/web/cleanups/miscdumps.pp#L14 ( or on the host itself, /etc/dumps/confs/cleanup_misc.conf and the "wikibase/wikidatawiki" entry). Now that the runs have specific dates we might want to increase that to 6 so we have the last two weeks' worth. TASK DETAIL https://phabricator.wikimedia.org/T264164 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Gehel, dcausse, jannee_e, CBogen, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed
ArielGlenn added a comment. No impact. Only tables actually in the database are dumped, a check of each table in the list is done beforehand. The code can be cleaned up anyways just to be nice though. TASK DETAIL https://phabricator.wikimedia.org/T264298 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Addshore, toan, jannee_e, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T264164: Cleanup broken dumps in /wikidatawiki/entities/20200921/
ArielGlenn added subscribers: Gehel, ArielGlenn. ArielGlenn added a comment. @Gehel was just asking about these yesterday and whether he should clean them up. The procedure is: delete first from the appropriate dumpsdata host (dumpsdata1002) where they are first written. Then delete them from the labstore1006 and 1007 hosts to which they would be rsynced. On dumpsdata1002 the path is /data/otherdumps to the tree containing all of the various datasets unrelated to xml/sql dumps. On the labstore hosts, it is /srv/dumps/xmldatadumps/public/other TASK DETAIL https://phabricator.wikimedia.org/T264164 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Gehel, dcausse, jannee_e, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T220883: Wikidata JSON dumps should include Lexemes
ArielGlenn added a comment. I renew my question above in T220883#5185999 <https://phabricator.wikimedia.org/T220883#5185999> and if someone can answer this, I can work with them to make these go live. TASK DETAIL https://phabricator.wikimedia.org/T220883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: DVrandecic, Addshore, ArielGlenn, VIGNERON, Aklapper, hoo, Lydia_Pintscher, Envlh, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Mahir256, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Wikidata-bugs, aude, Svick, Mbch331, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. Re-enabled, checked daily runs, they look good, so I'm resolving this. Thanks, everybody! TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, eprodromou, Hook696, Adidsone1, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Phukettaxigroup, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Ramsey-WMF, Darkminds3113, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, Lewizho99, Maathavan, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data
ArielGlenn added a comment. Updated (ouch!) F32352585: commons_slots.png <https://phabricator.wikimedia.org/F32352585> TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Ladsgroup, Abit, matthiasmullie, Marostegui, Mholloway, Addshore, Ramsey-WMF, jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, lmata, jannee_e, CBogen, Akuckartz, darthmon_wmde, Legado_Shulgin, Nandana, JKSTNK, Davinaclare77, Qtn1293, Techguru.pc, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, gnosygnu, Jane023, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Steinsplitter, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. In T260232#6448382 <https://phabricator.wikimedia.org/T260232#6448382>, @gerritbot wrote: > Change 625642 **merged** by jenkins-bot: > [mediawiki/core@master] don't pass null page id to page related queries for category change rdf dumps > > https://gerrit.wikimedia.org/r/625642 When this is deployed on the wikis I'll be able to re-enable category dumps, both dailies and weeklies, which shouldmean at the end of the week. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, eprodromou, Hook696, Adidsone1, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Phukettaxigroup, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Ramsey-WMF, Darkminds3113, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, Lewizho99, Maathavan, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. In T260232#6390706 <https://phabricator.wikimedia.org/T260232#6390706>, @gerritbot wrote: > Change 620775 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn): > [mediawiki/core@master] don't include null page ids in query list for category dumps > > https://gerrit.wikimedia.org/r/620775 I have tested the above patch by doing a manual run of the cron script on snapshot1008 as the dumpsgen user: dumpsgen@snapshot1008:~$ /usr/local/bin/dumpcategoriesrdf.sh --config /etc/dumps/confs/wikidump.conf.other --list /srv/mediawiki/dblists/categories-rdf.dblist It completed in a little under 4 hours for all wikis. What is needed to get the patch merged? TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, eprodromou, Hook696, Adidsone1, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Phukettaxigroup, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Ramsey-WMF, Darkminds3113, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g, Ltrlg ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T262187: Wikidata entity dumps didn't start this week
ArielGlenn created this task. ArielGlenn added projects: Wikidata, Dumps-Generation. TASK DESCRIPTION This change: P12492 <https://phabricator.wikimedia.org/P12492> left the dump db group empty, and so any attempts to run wikidata entity dumps failed. The host was added back in early on September 7. The dumps for this week should be restarted; you'll want to coordinate this with the deployment of https://gerrit.wikimedia.org/r/c/operations/puppet/+/622342 which should be deployed when no jobs are running. Wikidata entity dumps use the flag --dbgroupdefault; it would be a good idea for that flag to permit fallback to use of any host in the special case that the requested group is empty. TASK DETAIL https://phabricator.wikimedia.org/T262187 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: RKemper, ArielGlenn, jannee_e, Akuckartz, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T261204: Wikidata lexeme ttl dumps should be in a "predictable" folder
ArielGlenn added a comment. I think we can just move this through and keep our eyes on it. TASK DETAIL https://phabricator.wikimedia.org/T261204 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, dcausse, Alter-paule, jannee_e, Beast1978, CBogen, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. I took to brute force approach of writing all queries to a log file by adding the appropriate fopen/fputs/fclose in Database::select (live on snapshot1010, testbed host). I then ran: dumpsgen@snapshot1010:/srv/mediawiki$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php maintenance/categoryChangesAsRdf.php --wiki=commonswiki -s 2020081521 -e 20200817050001 | gzip > /srv/tmp/categories-out.gz I examined the output and found numerous examples of queries with the ' ' string in them (without the space). The following two queries were back-to-back, indicating that one was used to generate input for the next: SELECT page_id,cat_title AS `rc_title`,pp_propname,cat_pages,cat_subcats,cat_files FROM `category` LEFT JOIN `page` ON ((page_title = cat_title) AND page_namespace = 14) LEFT JOIN `page_props` ON (pp_propname = 'hiddencat' AND (pp_page = page_id)) WHERE cat_title IN ('Bridges_over_Kunar_River_(Pakistan)','People_of_the_University_of_Wyoming','University_of_Wyoming','Bus_routes_numbered_144','Churches_in_the_Roman_Catholic_Archdiocese_of_Benevento','August_2020_in_Cardiff','Cardiff_Coach_Station,_Sophia_Gardens','Bus_stations_in_Cardiff','Sophia_Gardens','Logos_of_companies_based_in_Mecklenburg-Vorpommern','Rameswaram','Media_needing_categories_as_of_18_March_2018','All_media_needing_categories_as_of_2018','Pages_with_local_object_coordinates_and_missing_SDC_coordinates','CC-BY-SA-4.0','Self-published_work','Photographs_by_LigaDue','Civitella_Marittima','Pages_with_maps','Scans_from_the_Internet_Archive','CC-PD-Mark','PD_US_Government','FEDLINK_-_United_States_Federal_Collection','Books_uploaded_by_Fæ','Files_with_no_machine-readable_author','Former_bus_lines_in_Budapest','Bus_lines_in_Budapest','Plzeň_1','Plzeň','Plzeň-City_District','Kaufland_Plzeň-Roudná','Epta_Piges_(Rhodes)','PD_US_expired','Books_in_the_Library_of_Congress','Trains_at_Inuyama_Yuen_Station','Inuyama_Yuen_Station','People_in_1910','2_men','OCR_detected_cover_page','1910_photographs','Iwakura_Station_(Aichi)','Unidentified_subjects_in_Japan','名古屋鉄道の画像','駅名板画像','Alumni_of_the_University_of_Wyoming','Lety_memorial','Cultural_buildings_in_Burgos','Iwateken_Kotsu','岩手県交通の画像','Piet_Retief,_Mpumalanga','Quality_images_missing_SDC_source_of_file','Quality_images_missing_SDC_copyright_status','Quality_images_missing_SDC_copyright_license','Quality_images_missing_SDC_inception','Media_requiring_renaming','Media_requiring_renaming_-_rationale_6','Bus_routes_numbered_148','Stained-glass_windows_in_Burgenland','Stained-glass_windows_in_Austria_by_district','Rust_(Burgenland)','PD_NASA','Tropical_Storm_Josephine_(2020)','Quality_images_missing_SDC_Commons_quality_assessment','PD-old-100-expired','Medical_Heritage_Library','Nominated_valued_image_candidates','Iwate_Kyūkō_Bus','バス画像','Bus_routes_numbered_149','Quality_images_missing_SDC_creator','Bus_routes_numbered_150','1926-03-27','Breda,_Netherlands','一関市の画像','Bus_routes_numbered_147','Hernán_Cortés','Augusto_Belvedere','Ichinoseki_Station','1926_photographs','Items_with_OTRS_permission_confirmed','Files_with_PermissionOTRS_template_but_without_P6305_SDC_statement','Stolpersteine_in_Oslo-Gamle','Images_uploaded_by_Donna_Gedenk','Pages_with_local_camera_coordinates_and_missing_SDC_coordinates','1926_photographs_of_the_United_States','Schools_in_Quebec_City','Railway_photographs_by_Geof_Sheppard','Photographs_by_Geof_Sheppard') SELECT cl_from,cl_to FROM `categorylinks`WHERE cl_type = 'subcat' AND cl_from IN (16427435,77160905,29237265,5273988,93171207,8292833,49598671,48452708,73514884,73514913,93141746,73514933,73514942,5229557,65375295,89119256,49325694,2371050,11740061,71765819,2581799,12178689,16468547,3355416,92207293,56860321,45788180,4127763,47563334,102952,4314089,25108543,93119689,5062995,2255349,6788554,'',62189827,93056961) ORDER BY cl_from ASC,cl_to ASC LIMIT 200 And lo and behold, when I run the first query, what do I get: +--+
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. Just for completeness, on db2073 I also I ran the original query with the crap entry, the show explain showed use of a filesort as above, and the execution time was... well it's still going, 330 seconds in. I killed it. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. I saw multiple queries with this string in them while camping on the production vslow and looking at the processlist. I don't know how many of the queries have this issue. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. When I ran the above query on db2073 (codfw dups and vslow host) without the crap ' ' field in there, it returned in 0.00 seconds. Maybe the bad entries are a new development? TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. SELECT /* BatchRowIterator::next */ cl_from,cl_to FROM `categorylinks` WHERE cl_type = 'subcat' AND cl_from IN (92967652,234494,24559020,960551,3007520,76398273,6972234,363488,2257260,4157420,89319925,84920900,41797907,61421859,92055128,9221880,14562,26762776,33298380,65449552,3795363,66235719,42442426,89319828,27708617,2563533,66701920,22548996,108484,25232065,6846286,43665564,2257433,8811984,84203487,3837544,5324927,8645978,'',805218,1078394,81978764,391851) ORDER BY cl_from ASC,cl_to ASC LIMIT 200 What is that empty thing in there? I see ' ' in the list of ids. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki
ArielGlenn added a comment. Daily rdf dumps are probably broken until this is resolved, just a fyi for folks importing these for search purposes. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, CBogen, Cparle, Umherirrender, DannyS712, Naike, WDoranWMF, Krinkle, aaron, Reedy, Ladsgroup, Aklapper, Marostegui, XeroS_SkalibuR, jannee_e, Akuckartz, Adidsone1, darthmon_wmde, holger.knust, EvanProdromou, Nandana, Namenlos314, Phukettaxigroup, Lahi, Gq86, Darkminds3113, Lucas_Werkmeister_WMDE, GoranSMilovanovic, Jayprakash12345, Lunewa, QZanden, EBjune, merbst, LawExplorer, Vali.matei, _jensen, rosalieper, Agabi10, Scott_WUaS, Pchelolo, Jonas, Xmlizer, Volker_E, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dcljr, Dinoguy1000, Manybubbles, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] T257876: redirected Q & deleted P Not Consistant in the json dump and web front end
ArielGlenn added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T257876 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Alicezou26, jannee_e, Akuckartz, darthmon_wmde, Nandana, Jony, Lahi, Gq86, NoohNaeem, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Guy13949413, _jensen, rosalieper, Scott_WUaS, gnosygnu, mys_721tx, Wikidata-bugs, aude, Svick, Mbch331, Krenair, jeremyb ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Links latest-full.ttl.bz2 -> 20200116/commons-20200116-full.ttl.bz2 and latest-full.ttl.gz -> 20200116/commons-20200116-full.ttl.gz have been cleaned up. Thanks for the suggestion! TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: DD063520, D063520, CBogen, nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Ixocactus, Wong128hk, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. It's linked off the 'other datasets' page near the top. But here's the direct link: https://dumps.wikimedia.org/other/wikibase/commonswiki/ TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: DD063520, D063520, CBogen, nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Alter-paule, jannee_e, Beast1978, Un1tY, Akuckartz, Hook696, darthmon_wmde, Kent7301, joker88john, CucyNoiD, Nandana, Namenlos314, Gaboe420, Giuliamocci, Cpaulf30, Lahi, Gq86, Af420, Bsandipan, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, Lewizho99, Maathavan, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Ixocactus, Wong128hk, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data
ArielGlenn added a comment. Updated.F31919691: commons_slots_new.png <https://phabricator.wikimedia.org/F31919691> TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Ladsgroup, Abit, matthiasmullie, Marostegui, Mholloway, Addshore, Ramsey-WMF, jcrespo, Yann, MarkTraceur, ArielGlenn, Aklapper, lmata, jannee_e, CBogen, Akuckartz, darthmon_wmde, Legado_Shulgin, Nandana, JKSTNK, Davinaclare77, Qtn1293, Techguru.pc, Lahi, PDrouin-WMF, Gq86, E1presidente, Cparle, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, Th3d3v1ls, Hfbn0, QZanden, Tramullas, Acer, LawExplorer, Salgo60, Zppix, Silverfish, _jensen, rosalieper, Scott_WUaS, Susannaanas, Wong128hk, gnosygnu, Jane023, Wikidata-bugs, Base, aude, Ricordisamoa, Wesalius, Lydia_Pintscher, Fabrice_Florin, Raymond, faidon, Steinsplitter, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. @dcausse what's your time frame? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, jannee_e, CBogen, darthmon_wmde, Nandana, Namenlos314, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Ixocactus, Wong128hk, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run
ArielGlenn added a comment. Unless folks want to keep it open to work on speeding it up in the future? TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: SilentSpike, WMDE-leszek, ArielGlenn, Lea_Lacroix_WMDE, jcrespo, Addshore, Lydia_Pintscher, Aklapper, Ladsgroup, Marostegui, darthmon_wmde, Nandana, jijiki, Amorymeltzer, Imarlier, Lahi, Gq86, Lsherwinforone, GoranSMilovanovic, Jayprakash12345, QZanden, LawExplorer, Sethakill, elukey, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, aude, Bawolff, He7d3r, Jdforrester-WMF, Mbch331, Rxy, Jay8g, akosiaris ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. I see that we're no longer blocked. Does this mean that we're good to go for weekly runs? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, jannee_e, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Ixocactus, Wong128hk, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run
ArielGlenn added a comment. In T238199#6135018 <https://phabricator.wikimedia.org/T238199#6135018>, @Ladsgroup wrote: > ... > Anyway, Lydia said it's fine to do it tomorrow when it gets announced by our communication manager. Does that work for you? Anything's fine as long as there's a plan of some sort before next month, so, sure! TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: WMDE-leszek, ArielGlenn, Lea_Lacroix_WMDE, jcrespo, Addshore, Lydia_Pintscher, Aklapper, Ladsgroup, Marostegui, Blissjay007, Oblanco79, Alter-paule, Beast1978, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, E.S.A-Sheild, darthmon_wmde, Kent7301, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, jijiki, Gaboe420, Versusxo, Majesticalreaper22, Amorymeltzer, Giuliamocci, Adrian1985, Cpaulf30, Imarlier, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lsherwinforone, GoranSMilovanovic, Adik2382, Jayprakash12345, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Sethakill, WSH1906, Lewizho99, Maathavan, elukey, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, aude, Bawolff, He7d3r, Jdforrester-WMF, Mbch331, Rxy, Jay8g, akosiaris ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run
ArielGlenn added a comment. Can we do this temporarily while the query is being fixed up? It looks like it had to be killed in Nov, Feb, Apr, May, so I'd rather temp disable than require folks to shoot it (and anything else hung as a side effect). TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lea_Lacroix_WMDE, jcrespo, Addshore, Lydia_Pintscher, Aklapper, Ladsgroup, Marostegui, Blissjay007, Oblanco79, Alter-paule, Beast1978, Un1tY, Hook696, Daryl-TTMG, RomaAmorRoma, E.S.A-Sheild, darthmon_wmde, Kent7301, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, jijiki, Gaboe420, Versusxo, Majesticalreaper22, Amorymeltzer, Giuliamocci, Adrian1985, Cpaulf30, Imarlier, Lahi, Gq86, Af420, Darkminds3113, Bsandipan, Lordiis, Lsherwinforone, GoranSMilovanovic, Adik2382, Jayprakash12345, Th3d3v1ls, Ramalepe, Liugev6, QZanden, LawExplorer, Sethakill, WSH1906, Lewizho99, Maathavan, elukey, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, aude, Bawolff, He7d3r, Jdforrester-WMF, Mbch331, Rxy, Jay8g, akosiaris ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run
ArielGlenn added a comment. Can we just skip the updateSpecialPages.php wikidatawiki --override --only=Fewestrevisions script altogether, instead of shooting it every month? TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lea_Lacroix_WMDE, jcrespo, Addshore, Lydia_Pintscher, Aklapper, Ladsgroup, Marostegui, darthmon_wmde, Nandana, jijiki, Amorymeltzer, Imarlier, Lahi, Gq86, Lsherwinforone, GoranSMilovanovic, Jayprakash12345, QZanden, LawExplorer, Sethakill, elukey, _jensen, rosalieper, Scott_WUaS, Wong128hk, Wikidata-bugs, aude, Bawolff, He7d3r, Jdforrester-WMF, Mbch331, Rxy, Jay8g, akosiaris ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T252632: Restart wikidata entity dumps
ArielGlenn added a comment. As I understand it the long running query comes from a monthly cron job. TASK DETAIL https://phabricator.wikimedia.org/T252632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, ArielGlenn, jannee_e, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Created] T252632: Restart wikidata entity dumps
ArielGlenn created this task. ArielGlenn added projects: Dumps-Generation, Wikidata. TASK DESCRIPTION The weekly run was shot this morning when vslow db connections stalled due to an unrelated long-running query, see T238199 <https://phabricator.wikimedia.org/T238199> It can be restarted from wherever it died. Note that we could face this same issue again in the future, as the underlying problem with that slow query is not resolved. TASK DETAIL https://phabricator.wikimedia.org/T252632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, ArielGlenn, jannee_e, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Hi, just checking in: any progress on invetigating the 'extra' dumps content? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, jannee_e, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Ixocactus, Wong128hk, gnosygnu, jkroll, Wikidata-bugs, Jdouglas, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T248857: Wikdata entities dump not generated
ArielGlenn added subscribers: hoo, ArielGlenn. ArielGlenn added a comment. See T248612 <https://phabricator.wikimedia.org/T248612> for that, I believe @hoo is planning to deploy and restart the week's run today. TASK DETAIL https://phabricator.wikimedia.org/T248857 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, hoo, JAllemandou, dcausse, Aklapper, jannee_e, CBogen, darthmon_wmde, Nandana, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, EBjune, LawExplorer, _jensen, rosalieper, Scott_WUaS, gnosygnu, Wikidata-bugs, aude, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. @Cparle, No blocks on your side, the ball is now in @dcausse 's court. :-) TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots
ArielGlenn added a comment. This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/ and related patches, so we're looking at March 1 if all goes well. TASK DETAIL https://phabricator.wikimedia.org/T238972 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Christian75, Schnark, binbot, Johan, Lucas_Werkmeister_WMDE, RhinosF1, Benjavalero, hoo, leila, ArielGlenn, darthmon_wmde, Viztor, Nandana, Amorymeltzer, Lahi, Gq86, GoranSMilovanovic, Lunewa, QZanden, LawExplorer, Avner, JJMC89, _jensen, rosalieper, Scott_WUaS, Luke081515, gnosygnu, Wikidata-bugs, aude, Capt_Swing, TheDJ, Mbch331, Rxy, Jay8g ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T243701: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service)
ArielGlenn added a comment. In T243701#5855352 <https://phabricator.wikimedia.org/T243701#5855352>, @Lea_Lacroix_WMDE wrote: > Over the past weeks, we noticed a huge increase of content in Wikidata. Maybe that's something worth looking at? Wikidata content is growing at a fast and steady pace and has been for a few years now. For the last few months it's been expanding at a rate of around 3,500,000 new pages per month. So that seems unlikely to be connected. TASK DETAIL https://phabricator.wikimedia.org/T243701 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Ladsgroup, Alicia_Fagerving_WMSE, JeanFred, Pasleim, Gehel, Lea_Lacroix_WMDE, ArthurPSmith, Albertvillanovadelmoral, Xqt, Lucas_Werkmeister_WMDE, Addshore, jcrespo, Dvorapa, Aklapper, Strainu, darthmon_wmde, ET4Eva, Legado_Shulgin, Nandana, Davinaclare77, Qtn1293, Techguru.pc, Lahi, Gq86, Darkminds3113, GoranSMilovanovic, Th3d3v1ls, Hfbn0, QZanden, EBjune, merbst, LawExplorer, Vali.matei, Avner, Zppix, _jensen, rosalieper, Scott_WUaS, Jonas, FloNight, Xmlizer, Volker_E, Wong128hk, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, GWicke, Dinoguy1000, Manybubbles, Lydia_Pintscher, faidon, Mbch331, Rxy, Jay8g, fgiunchedi ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Some unexpected (?) triples popping up that @dcausse is looking into, so the dumps will not be turned on in cron until we have the thumbs up on that. See T243292 <https://phabricator.wikimedia.org/T243292> If it turns out the data is all ok, we can move forward. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a subtask: T243292: Fix the munger to support commons RDF dump. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Updated] T243292: Fix the munger to support commons RDF dump
ArielGlenn added a parent task: T221917: Create RDF dump of structured data on Commons. TASK DETAIL https://phabricator.wikimedia.org/T243292 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, dcausse, darthmon_wmde, Nandana, Lahi, Gq86, Lucas_Werkmeister_WMDE, GoranSMilovanovic, QZanden, EBjune, merbst, LawExplorer, _jensen, rosalieper, Scott_WUaS, Jonas, Xmlizer, jkroll, Smalyshev, Wikidata-bugs, Jdouglas, aude, Tobias1984, Manybubbles, Mbch331 ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Changed Subscribers] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a subscriber: dcausse. ArielGlenn added a comment. @dcausse is going to check over the ttl dump and let me know if it looks ok; if so then I'll flip the switch for generation weekly and make sure there's cleanup too. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: dcausse, EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. In https://dumps.wikimedia.org/other/wikibase/commonswiki/ there are two ttl files, gz and bz2 compressed. Please have a look! The bash script producing them complained that /usr/local/bin/dumpwikibaserdf.sh: line 224: setDcatConfig: command not found so I'm looking at that. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. I found a ticket that mentions use of ttl files so I'll run /usr/local/bin/dumpwikibaserdf.sh commons full ttl and keep an eye on it. Running on snapshot1008 in a screen session. Here we go! TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: EBernhardson, Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, darthmon_wmde, Nandana, JKSTNK, Lahi, PDrouin-WMF, Gq86, E1presidente, Anooprao, SandraF_WMF, GoranSMilovanovic, Lunewa, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. I plan to try running /usr/local/bin/dumpwikibaserdf.sh commons full nt on Thursday morning and see how long it takes with the 8 shards that are currently configured. @Abit is the nt format the one needed for WDQS testing? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Ran php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 500 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing --first-page-id 78846320 --last-page-id 79046320 --shard 0 --sharding-factor 1 2>/var/lib/dumpsgen/mediainfo-log-small-shard-oom.txt | gzip > /mnt/dumpsdata/temp/dumpsgen/mediainfo-dumps-test-nt-one-shard-small-oom.gz which should cover the page range where we had the oom; it ran to completion fine. I guess that there is some small memory leak that must accumulate over batches, which is what did us in earlier. As long as we limit runs to some reasonable number of pages each time, we should be fine. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Ran php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 1000 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing --first-page-id 1 --last-page-id 21 --shard 1 --sharding-factor 4 2>/var/lib/dumpsgen/mediainfo-log-small-shard.txt | gzip > /mnt/dumpsdata/temp/dumpsgen/mediainfo-dumps-test-nt-one-shard-of-4-small.gz and it also ran fine. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Note to self that a run of php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 250 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing --first-page-id 1 --last-page-id 21 --shard 0 --sharding-factor 1 2>/var/lib/dumpsgen/mediainfo-log-small.txt | gzip > /mnt/dumpsdata/temp/dumpsgen/mediainfo-dumps-test-nt-noshard-small.gz worked fine. Going to run one with a sharding factor of 4 and a batch size 4 times larger, to see how that is. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. This morning the job was terminated by the oom killer: [4288057.417443] Out of memory: Kill process 117265 (php) score 868 or sacrifice child [4288057.425084] Killed process 117265 (php) total-vm:58241128kB, anon-rss:56901636kB, file-rss:0kB, shmem-rss:0kB It produced a file of size 380M with 2224612 entitites in it before being shot. One of the last entries in it is the page File:Gerrardina_foliosa_1.jpg with page id 78 846 520 and mediainfo entity (Depicts) added on Jan 10th, 2020. Also the gz output file is not truncated, so perhaps it is complete. @Abit Should I move it somewhere for folks to test with? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. A batchsize of 50k turned out to be too large. Same with 5k. I'm now running with a batchsize of 500, which will surely be too small, but at least I am getting output. I'll check on it tomorrow and see how it's doing. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs
[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons
ArielGlenn added a comment. Because I've gotten a nice run in beta with the --ignore-missing flag, I'm trying a test run on snapshot1008 in a screen session: php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 5 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing 2>>/var/lib/dumpsgen/mediainfo-log.txt | gzip > /mnt/dumpsdata/temp/dumpsgen/mediainfo-dumps-test-nt-noshard.gz If the output looks good, I'll put it somewhere for WQS testing and move forward with making these weekly runs with the appropriate number of parallel processes. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Cparle, Abit, Gehel, jleedev, hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher, Bugreporter, Tgr, Ramsey-WMF, Jarekt, Addshore, Tpt, Salgo60, Lucas_Werkmeister_WMDE, Smalyshev, Hook696, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild, darthmon_wmde, Meekrab2012, joker88john, CucyNoiD, Nandana, NebulousIris, JKSTNK, Gaboe420, Versusxo, Majesticalreaper22, Giuliamocci, Adrian1985, Cpaulf30, Lahi, PDrouin-WMF, Gq86, Af420, E1presidente, Darkminds3113, Anooprao, SandraF_WMF, Bsandipan, Lordiis, GoranSMilovanovic, Adik2382, Lunewa, Th3d3v1ls, Ramalepe, Liugev6, QZanden, EBjune, Tramullas, Acer, merbst, LawExplorer, WSH1906, Lewizho99, Maathavan, Silverfish, _jensen, rosalieper, Taiwania_Justo, Scott_WUaS, Jonas, Xmlizer, Susannaanas, Ixocactus, Wong128hk, gnosygnu, Jane023, jkroll, Wikidata-bugs, Jdouglas, Base, matthiasmullie, aude, Tobias1984, El_Grafo, Dinoguy1000, Manybubbles, Ricordisamoa, Wesalius, Fabrice_Florin, Raymond, Jdforrester-WMF, Mbch331, Keegan ___ Wikidata-bugs mailing list Wikidata-bugs@lists.wikimedia.org https://lists.wikimedia.org/mailman/listinfo/wikidata-bugs