[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2020-01-08 Thread ArielGlenn
ArielGlenn added a subscriber: Cparle. ArielGlenn added a comment. @Abit: I need to get my last question on T241149 <https://phabricator.wikimedia.org/T241149> answered; if these errors only go to stderr then I can at least run a test dump, but if they go to logstash that's 50 milli

[Wikidata-bugs] [Maniphest] [Updated] T241169: Create database dump of new Wikibase term store

2019-12-22 Thread ArielGlenn
ArielGlenn closed this task as a duplicate of T226167: audit public tables and make sure we dump them all. TASK DETAIL https://phabricator.wikimedia.org/T241169 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Bugreporter

[Wikidata-bugs] [Maniphest] [Updated] T241169: Create database dump of new Wikibase term store

2019-12-22 Thread ArielGlenn
ArielGlenn added a comment. This is already covered in T226167 <https://phabricator.wikimedia.org/T226167> and a patch set is waiting to be merged once migration is complete :-) TASK DETAIL https://phabricator.wikimedia.org/T241169 EMAIL PREFERENCES https://phabricator.wikimed

[Wikidata-bugs] [Maniphest] [Updated] T219175: [Mega] - Migrate data from wb_terms to new schema

2019-12-20 Thread ArielGlenn
ArielGlenn added a parent task: T226167: audit public tables and make sure we dump them all. TASK DETAIL https://phabricator.wikimedia.org/T219175 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: WMDE-leszek, Ladsgroup, Marostegui

[Wikidata-bugs] [Maniphest] [Updated] T219301: Migrate to and read from new store for property terms

2019-12-20 Thread ArielGlenn
ArielGlenn added a parent task: T226167: audit public tables and make sure we dump them all. TASK DETAIL https://phabricator.wikimedia.org/T219301 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Addshore, Ladsgroup

[Wikidata-bugs] [Maniphest] [Updated] T239905: dumpRdf for mediainfo entities loads data from db more often than it needs to

2019-12-19 Thread ArielGlenn
ArielGlenn added a comment. I went ahead and opened a new ticket, see T241149 <https://phabricator.wikimedia.org/T241149> In the meantime. I checked query output for a 5 page run, and while there's some scary stuff in there, it's all per batch and I'm gonna grit my teeth and

[Wikidata-bugs] [Maniphest] [Commented On] T239905: dumpRdf for mediainfo entities loads data from db more often than it needs to

2019-12-19 Thread ArielGlenn
ArielGlenn added a comment. Haven't checked query execution yet but I did notice one thing: for pages (in namespace 6) with no mediainfo slot, an error message is logged of he form: "[failed-to-dump]: Failed to dump M70620 (Entity not found: M70620)" Given that most pages

[Wikidata-bugs] [Maniphest] [Commented On] T239905: dumpRdf for mediainfo entities loads data from db more often than it needs to

2019-12-19 Thread ArielGlenn
ArielGlenn added a comment. Doing some initial testing on beta. as the dumpsgen user from snapshot01. For one output file with one shard: php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --shard 1 --sharding-factor 2

[Wikidata-bugs] [Maniphest] [Commented On] T222349: Do not rate limit dumps from internal network

2019-12-19 Thread ArielGlenn
ArielGlenn added a comment. In T222349#5753559 <https://phabricator.wikimedia.org/T222349#5753559>, @Gehel wrote: > In T222349#5751029 <https://phabricator.wikimedia.org/T222349#5751029>, @ArielGlenn wrote: > >> How fast a download do folks want? > &

[Wikidata-bugs] [Maniphest] [Commented On] T222349: Do not rate limit dumps from internal network

2019-12-18 Thread ArielGlenn
ArielGlenn added a comment. How fast a download do folks want? Can we schedule rsyncs for the specifiic use cases with a higher bandwidth cap? TASK DETAIL https://phabricator.wikimedia.org/T222349 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T239905: dumpRdf for mediainfo entities loads data from db more often than it needs to

2019-12-17 Thread ArielGlenn
ArielGlenn added a comment. In T239905#5745201 <https://phabricator.wikimedia.org/T239905#5745201>, @Ramsey-WMF wrote: > This should hit production this week. Ariel, please let Cormac know directly if it doesn't work  Sure will. I plan to test in beta early this wee

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2019-12-11 Thread ArielGlenn
ArielGlenn added a comment. F31470388: commons_slots.png <https://phabricator.wikimedia.org/F31470388> generated via https://github.com/apergos/misc-wmf-crap/blob/master/sdc-growth/get_slot_growth.py a quickie one-off script. TASK DETAIL https://phabricator.wikimedia.org/T226093

[Wikidata-bugs] [Maniphest] [Commented On] T239894: Dispatching broken on beta - Fatal error: Class 'Memcached' not found in ObjectCache.php on line 186

2019-12-09 Thread ArielGlenn
ArielGlenn added a comment. \o/ awesome! TASK DETAIL https://phabricator.wikimedia.org/T239894 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Ladsgroup, ArielGlenn Cc: ArielGlenn, Aklapper, Ladsgroup, Addshore, Iflorez, darthmon_wmde, alaa_wmde

[Wikidata-bugs] [Maniphest] [Commented On] T239894: Dispatching broken on beta - Fatal error: Class 'Memcached' not found in ObjectCache.php on line 186

2019-12-09 Thread ArielGlenn
ArielGlenn added a comment. What do we think about the pile of these in the log: 21:31:02 Wikibase\Repo\Store\Sql\SqlChangeDispatchCoordinator::selectClient: Could not lock any of the candidate client wikis for dispatching Are they a problem? TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T222349: Do not rate limit dumps from internal network

2019-12-09 Thread ArielGlenn
ArielGlenn added a comment. Repeating here some things from a chort chat in irc: The original limits were set because we had one host doing all of - web service to the public - nfs service to analytics and labs - rsync to public mirrors - back-end nfs share for dumps generation

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-12-02 Thread ArielGlenn
ArielGlenn added a comment. We need some timing tests on these: is there a happy medium between 'best settings for compression' and 'best settings for speed'? What are we looking at in terms of execution time and space, if we add this step? We'd continue to provide bz2s I guess, since those

[Wikidata-bugs] [Maniphest] [Commented On] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots

2019-11-27 Thread ArielGlenn
ArielGlenn added a comment. Thanks for the forwards! TASK DETAIL https://phabricator.wikimedia.org/T238972 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: binbot, Johan, Lucas_Werkmeister_WMDE, RhinosF1, Benjavalero, hoo, leila

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2019-11-27 Thread ArielGlenn
ArielGlenn added a comment. Do we have a meeting scheduled to talk about capacity needs? TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Ladsgroup, Abit, matthiasmullie

[Wikidata-bugs] [Maniphest] [Closed] T236006: consider generating an empty abstract file for wikidata

2019-11-27 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn added a comment. This is now complete. Nov 20th wikidata abstract files are nice little empty files as expected. TASK DETAIL https://phabricator.wikimedia.org/T236006 EMAIL PREFERENCES https://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots

2019-11-27 Thread ArielGlenn
ArielGlenn added subscribers: leila, hoo. ArielGlenn added a comment. https://lists.wikimedia.org/pipermail/wikitech-l/2019-November/092821.html Email sent to wikitech-l and xmldatadumps-l. @leila would you be willing to forward to the research mailing lists? @hoo are you on the wikidata

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-11-27 Thread ArielGlenn
ArielGlenn added a comment. In T230856#5692766 <https://phabricator.wikimedia.org/T230856#5692766>, @Cparle wrote: > I'm not sure if T222497 <https://phabricator.wikimedia.org/T222497> covers this stuff and, if not, what is actionable here by the structured data team.

[Wikidata-bugs] [Maniphest] [Updated] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots

2019-11-25 Thread ArielGlenn
ArielGlenn added projects: Research, Wikidata. ArielGlenn added a comment. I'm going to send an email announcement to wikitech and xmldatadumps-l. Someone on the research and wikidata lists should forward the announcement there. Adding the relevant projects (sorry if they aren't right

[Wikidata-bugs] [Maniphest] [Updated] T238959: Make TextPassDumper work with 0.11 dump schema

2019-11-23 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. TASK DETAIL https://phabricator.wikimedia.org/T238959 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, ArielGlenn Cc: CCicalese_WMF, Aklapper, Fjalapeno, ArielGlenn, gerritbot, daniel

[Wikidata-bugs] [Maniphest] [Commented On] T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps

2019-11-22 Thread ArielGlenn
ArielGlenn added a comment. In T199121#5684397 <https://phabricator.wikimedia.org/T199121#5684397>, @daniel wrote: > In T199121#5684250 <https://phabricator.wikimedia.org/T199121#5684250>, @ArielGlenn wrote: > >> ... >> https://gerrit.wiki

[Wikidata-bugs] [Maniphest] [Commented On] T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps

2019-11-22 Thread ArielGlenn
ArielGlenn added a comment. In T199121#5684237 <https://phabricator.wikimedia.org/T199121#5684237>, @daniel wrote: > In T199121#5683594 <https://phabricator.wikimedia.org/T199121#5683594>, @ArielGlenn wrote: > >> In T199121#5682911 <https://phabricator.wi

[Wikidata-bugs] [Maniphest] [Commented On] T199121: RFC: Spec for representing multiple content objects per revision (MCR) in XML dumps

2019-11-21 Thread ArielGlenn
ArielGlenn added a comment. In T199121#5682911 <https://phabricator.wikimedia.org/T199121#5682911>, @Nuria wrote: > I see this ticket is resolved but the dumps on commons have version version="0.10" since from this ticket i gather that the dumps that contain those s

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-11-20 Thread ArielGlenn
ArielGlenn added a comment. In T68025#5678460 <https://phabricator.wikimedia.org/T68025#5678460>, @Ladsgroup wrote: > ... > I would say we should add size and not just the number of rows. There's a big refactor of revision table being deployed that will free up l

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2019-11-08 Thread ArielGlenn
ArielGlenn added a comment. As evidenced by https://graphite.wikimedia.org/S/i we already have 5.5 million images with contents in the MediaInfo slot. Two months to go until end of the year and we see how low the prediction was compared to the actual number. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-11-08 Thread ArielGlenn
ArielGlenn added a comment. https://graphite.wikimedia.org/S/i I see growth on commons is quite... healthy :-D There are a few bots adding mediainfo data to images based on information in Wikidata; we'll likely see more of this in the next few months. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-11-04 Thread ArielGlenn
ArielGlenn added a comment. Bulk adds of depicts statements on deployment-prep will start this evening, now that the code is ready. It will run over a couple of days at least. Once complete we'll have 3k images on beta commons with captions and depicts statements in them, referencing 1k

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-11-01 Thread ArielGlenn
ArielGlenn added a comment. Adding items to wikidata in deployment-prep for use in depicts statements for the uploaded images in beta commons. Depicts statements early next week most likely. TASK DETAIL https://phabricator.wikimedia.org/T230856 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T236006: consider generating an empty abstract file for wikidata

2019-10-30 Thread ArielGlenn
ArielGlenn added a comment. A week has passed and no one has commented. Silence equals consent, and the above patch has been tested with the config setting enabled and disabled, so it's ready to go. This will be merged shortly before the Nov 20th run unless something else derails

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-10-29 Thread ArielGlenn
ArielGlenn added a comment. I've started generating, uploading and captioning images in beta commons today, using the latest version of the script linked above. I'd like to add some depicts statements too. In any case, by the end of the week expect that we'll have several batches

[Wikidata-bugs] [Maniphest] [Commented On] T226941: Document the MediaInfo JSON output on mediawiki.org

2019-10-25 Thread ArielGlenn
ArielGlenn added a comment. There is already documentation of the data type itself: https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo/Data_Model I have added a draft of the jSON description at https://www.mediawiki.org/wiki/Extension:WikibaseMediaInfo/Data_Model/JSON borrowing

[Wikidata-bugs] [Maniphest] [Commented On] T225056: Run Item Terms Rebuild script

2019-09-27 Thread ArielGlenn
ArielGlenn added a comment. https://gerrit.wikimedia.org/r/#/c/operations/puppet/+/539498/ was merged in response and kicked in about 10 minutes ago, with good results on the graph. TASK DETAIL https://phabricator.wikimedia.org/T225056 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T225056: Run Item Terms Rebuild script

2019-09-27 Thread ArielGlenn
ArielGlenn added a comment. at around 6:50 UTC this morning we began seeing this: icinga-wm: PROBLEM - MediaWiki eqiad exceptions and fatals per minute on icinga1001 is CRITICAL: cluster=logstash job=statsd_exporter level=ERROR site=eqiad https://wikitech.wikimedia.org/wiki

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-08-28 Thread ArielGlenn
ArielGlenn added a comment. When I look at that image it looks pretty empty, am I missing something? TASK DETAIL https://phabricator.wikimedia.org/T68025 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Reedy, ArielGlenn

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-08-27 Thread ArielGlenn
ArielGlenn added a comment. I think Reedy was away and didn't see my pings. Anyways, thanks for moving forward on this, and we'll see how it looks in a week! TASK DETAIL https://phabricator.wikimedia.org/T68025 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T231276: RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979

2019-08-27 Thread ArielGlenn
ArielGlenn added a comment. In T231276#5441586 <https://phabricator.wikimedia.org/T231276#5441586>, @Lucas_Werkmeister_WMDE wrote: > ... > It’s part of the serialization. Not sure why that would be a new issue, though – this seems like a fairly fundamental

[Wikidata-bugs] [Maniphest] [Commented On] T231276: RevisionBasedEntityLookup.php: Revision 363395998 belongs to M77688146 instead of expected M81625979

2019-08-27 Thread ArielGlenn
ArielGlenn added a comment. https://commons.wikimedia.org/wiki/Special:Log?type===File%3ABolsonaro_with_Israeli_PM_Benjamin_Netanyahu%2C_Tel_Aviv%2C_31_March_2019.jpg== It was deleted and restored on 02:45, 26 Αυγούστου 2019 so I guess something isn't handled quite right in MediaInfo

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-08-23 Thread ArielGlenn
ArielGlenn added a comment. https://github.com/apergos/misc-wmf-crap/tree/master/glyph-image-generator Starting to get clever about this: ability to generate 50k small images with metadata that can be extracted for using in depicts and/or caption statements. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-08-21 Thread ArielGlenn
ArielGlenn added a comment. I'm looking at deployment-db05 now, and there are 63332 rows in the revision table, with 53250 rows in the content table. I guess we need to double the number of revisions and then add the structured data for those entries. we can probably be clever about

[Wikidata-bugs] [Maniphest] [Commented On] T230856: RDF dump performance for SDC

2019-08-20 Thread ArielGlenn
ArielGlenn added a comment. @Smalyshev Do you know how many entries have structured data on deployment-prep? Is that a useful testing ground right now or should we be populating the data over there first? TASK DETAIL https://phabricator.wikimedia.org/T230856 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-08-12 Thread ArielGlenn
ArielGlenn added a comment. I'm not thinking about the amount of time it takes, but rather the load on the database servers. Reasonable sized batched queries will be better, as I've seen already with stub dumps and slot retrieval. TASK DETAIL https://phabricator.wikimedia.org/T221917

[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2019-08-12 Thread ArielGlenn
ArielGlenn added a comment. I think T222497 <https://phabricator.wikimedia.org/T222497> should be resolved before this goes live. I can test it in deployment-prep before then, but I don't want to do production tests until there is some sort of batching. I checked on this week

[Wikidata-bugs] [Maniphest] [Commented On] T217329: bug in 1.33.0-wmf.18 breaks abstract dumps on testwikidatawiki | MWContentSerializationException $entityId and $targetId can not be the same

2019-08-08 Thread ArielGlenn
ArielGlenn added a comment. The python scripts at the dump end are (mostly) protected against exceptions from MediaWiki generally and from this failure case in particular. Since we have problematic data in production I've re-opened the ticket so that the WikiBase issue can somehow

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-08-05 Thread ArielGlenn
ArielGlenn added a comment. I sincerely apologize: this weekend the heat baked my brain and I did nothing related to computers at all. And Friday evening I was out. I'll set a notification to remind me this coming Friday earlier in the day, so that this gets done. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T198343: Replace all calls to Revision::getRevisionText()

2019-07-30 Thread ArielGlenn
ArielGlenn added a comment. The category links are the fallback that was designed, so this is a net positive. Going to go update the CR on the patch. TASK DETAIL https://phabricator.wikimedia.org/T198343 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T198343: Replace all calls to Revision::getRevisionText()

2019-07-30 Thread ArielGlenn
ArielGlenn added a comment. Ran the following with old and new code for abstracts: /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=anwiki /srv/v/mediawiki/php-1.34.0-wmf.15/extensions/ActiveAbstract/includes/AbstractFilter.php --full --report=1

[Wikidata-bugs] [Maniphest] [Closed] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!

2019-07-30 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. The wikis ran to completion, but I forgot to close this. Doing so now! TASK DETAIL https://phabricator.wikimedia.org/T228614 EMAIL PREFERENCES https://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] [Updated] T229290: Incremental RDF dumps

2019-07-30 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. TASK DETAIL https://phabricator.wikimedia.org/T229290 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Aklapper, Bugreporter, darthmon_wmde, DannyS712, Nandana, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-07-29 Thread ArielGlenn
ArielGlenn added a comment. The refactor patchset now checks out with all the wikidata dumps including json. I'd like to deploy it this weekend, giving plenty of time to make sure it's ok, test the structured data patchset, and then be able to deploy that separately. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!

2019-07-22 Thread ArielGlenn
ArielGlenn added a comment. Thanks to Daniel and James for review and merge. This has been deployed. I verified that the new code runs to completion on the page with the problematic revision, and that Special:Export for pages with revision history produces what we expect. I won't close

[Wikidata-bugs] [Maniphest] [Reopened] T217329: bug in 1.33.0-wmf.18 breaks abstract dumps on testwikidatawiki | MWContentSerializationException $entityId and $targetId can not be the same

2019-07-22 Thread ArielGlenn
ArielGlenn reopened this task as "Open". ArielGlenn lowered the priority of this task from "High" to "Normal". ArielGlenn added a comment. Re-opening for the underlying issue ( bad revision redirect). See https://www.wikidata.org/w/index.php?title=Q46721

[Wikidata-bugs] [Maniphest] [Commented On] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!

2019-07-22 Thread ArielGlenn
ArielGlenn added a comment. The above patch has been tested in beta on a tiny wiki and all jobs ran properly. It has been tested with the dump command in T228614#5352340 <https://phabricator.wikimedia.org/T228614#5352340> on snapshot1008 and the revisions for the page were properly re

[Wikidata-bugs] [Maniphest] [Commented On] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!

2019-07-22 Thread ArielGlenn
ArielGlenn added a comment. One side effect of this, besides wikidata stubs being broken, is that the stubs are running extremely slowly, given that the text content is being requested from external store for each slot (one at a time). For example, enwiki stub generation is not even through

[Wikidata-bugs] [Maniphest] [Retitled] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!

2019-07-22 Thread ArielGlenn
ArielGlenn renamed this task from "stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self " to "stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self; content read for every revision in stubs!".

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self

2019-07-22 Thread ArielGlenn
ArielGlenn added a subscriber: daniel. ArielGlenn added a comment. Stubs should not be loading content; this appears to be a problem introduced by https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/464768/ at line 360 of the new XmlDumpWriter.php: $text = $rev->getContent( SlotRec

[Wikidata-bugs] [Maniphest] [Updated] T228614: stubs dumps broken for wikidatawiki with old revision for an entity redirecting to self

2019-07-21 Thread ArielGlenn
ArielGlenn added projects: Wikidata, Wikimedia-production-error. TASK DETAIL https://phabricator.wikimedia.org/T228614 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, darthmon_wmde, Nandana, Sario528, Lahi, Gq86

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-21 Thread ArielGlenn
ArielGlenn added a comment. Sure thing! As soon as jenkins is happy I'll give it the thumbs up. I'll poke Reedy about it tomorrow too. TASK DETAIL https://phabricator.wikimedia.org/T68025 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Updated] T228569: Suspicious data loss on the Query Service

2019-07-21 Thread ArielGlenn
ArielGlenn added a comment. The truthy nt ones for last week are still finishing up, and then the lexemes for last week will go. That was due to T228104 <https://phabricator.wikimedia.org/T228104> which was caused by https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/519304/ in

[Wikidata-bugs] [Maniphest] [Closed] T228104: Wikibase dump scripts fail on external storage access

2019-07-19 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. The above changes were deployed everywhere on Wednesday. JSON and Rdf dumps have been running since then without issues. The JSON dumps should complete sometime today. Closing

[Wikidata-bugs] [Maniphest] [Commented On] T228104: Wikibase dump scripts fail on external storage access

2019-07-17 Thread ArielGlenn
ArielGlenn added a comment. I have started the json format dumps and they seem to be running correctly. TASK DETAIL https://phabricator.wikimedia.org/T228104 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Jdforrester-WMF

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-17 Thread ArielGlenn
ArielGlenn added a subscriber: Reedy. ArielGlenn added a comment. Thanks, @Addshore! The data looks like exactly what I was hoping to track. I agree that if we just publish the slot counts, there should be no privacy concerns; this information could be computed by anyone from

[Wikidata-bugs] [Maniphest] [Updated] T222497: dumpRDF for MediaInfo entities loads each page individually

2019-07-16 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. TASK DETAIL https://phabricator.wikimedia.org/T222497 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Lucas_Werkmeister_WMDE, Addshore, Ramsey-WMF, Aklapper, WMDE-leszek

[Wikidata-bugs] [Maniphest] [Updated] T228104: Wikibase dump scripts fail on external storage access

2019-07-16 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. TASK DETAIL https://phabricator.wikimedia.org/T228104 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, hoo, Daryl-TTMG, RomaAmorRoma, 0010318400, E.S.A-Sheild

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-07-10 Thread ArielGlenn
ArielGlenn added a comment. Ah I didn't even think about the param being set in the script. I had removed it during testing to see if that helped, and... nada. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-10 Thread ArielGlenn
ArielGlenn added a comment. Note that I don't need daily reports, weekly or even monthly would be good enough. But if there is an easy way to just see it on a dashboard or graph with little (or no!) work already, and the frequency is more often, then I will not complains. TASK DETAIL

[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2019-07-10 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, ArielGlenn, WMDE-leszek, Poyekhali, Steinsplitter, Aklapper, Lydia_Pintscher

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-07-10 Thread ArielGlenn
ArielGlenn added a comment. Running into a new problem testing on beta. dumpsgen@deployment-snapshot01:~$ /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki wikidatawiki --shard 1 --sharding-factor 2 --batch-size 2000

[Wikidata-bugs] [Maniphest] [Updated] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-10 Thread ArielGlenn
ArielGlenn added a comment. @jcrespo I want specifically slot count by type, and revision count, for commons. Eventually I will want the same for other projects with multiple slots, once the numbers are over some threshhold. See T68025#5313973 <https://phabricator.wikimedia.org/T68

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-08 Thread ArielGlenn
ArielGlenn added a comment. Are there visible graphs for these? Looking a bit at the repo scripts now. I guess we'd want something in src/commons/contentgrowth (? names are hard) TASK DETAIL https://phabricator.wikimedia.org/T68025 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2019-07-08 Thread ArielGlenn
ArielGlenn added a comment. I've commented about this over on the other ticket. Let's see what they say. TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mholloway, Addshore

[Wikidata-bugs] [Maniphest] [Commented On] T68025: [Story] Monitor size of some Wikidata database tables

2019-07-08 Thread ArielGlenn
ArielGlenn added a comment. We don't really have daily stats of any sort in grafana but maybe we should (or have a place for it to live in any case). I'd like to see # revisions, # slots, # slots -not-main-slot, all on Commons. And eventually # slots of each type on the projects that use

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2019-07-05 Thread ArielGlenn
ArielGlenn added a comment. I keep pretty decent tabs on wikidata growth, because of the dumps. I don't do that for commons entities because I can't even find the proper wikibase tables. I checked the wb_* tables on commonswiki and they all appear to be empty (?!) I can do some very

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2019-07-05 Thread ArielGlenn
ArielGlenn added a comment. I plan to deploy the refactoring patchset Sunday in between runs (possibly today if the json dump and the others all finish up at a reasonable hour). I ran wikdiata dumps in beta with and without the changeset (with altered values for shard, minfilesize

[Wikidata-bugs] [Maniphest] [Commented On] T227207: Wikibase JSON output (dumps, Special:EntityData) lacks qualifier hashes

2019-07-05 Thread ArielGlenn
ArielGlenn added a comment. The ones for this week will have a new date, the date on which they were started. The 'latest' links will point to it only when the dumps are complete and have passed some basic sanity checks in the script. No files are available to be monitored yet because

[Wikidata-bugs] [Maniphest] [Commented On] T119753: [Task] enforce page size limit

2019-07-05 Thread ArielGlenn
ArielGlenn added a comment. Out of curiosity, whatever happened to this? TASK DETAIL https://phabricator.wikimedia.org/T119753 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Krenair, Aklapper, aude, hoo, JanZerebecki

[Wikidata-bugs] [Maniphest] [Closed] T226601: Wikidata JSON dump generation broken

2019-07-03 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn added a comment. Going to close this task again, since the bug is different for the new problem. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] [Commented On] T227207: Wikibase JSON output (dumps, Special:EntityData) lacks qualifier hashes

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. I've moved the 20190701 json files to .not in that directory on the various hosts. Mirrors should reflect this change within 24 hours. Leaving the broken links so that folks with scripts get failure instead of reprocessing old files. @hoo would you be wiling

[Wikidata-bugs] [Maniphest] [Commented On] T227207: Wikibase JSON output (dumps, Special:EntityData) lacks qualifier hashes

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. Also note the only window to get a fix in for the next run is tonight's late night UTC SWAT, tomorrow is no deploys (US holdiay, Fri is no deploys, Monday this job will have already started for the week. TASK DETAIL https://phabricator.wikimedia.org/T227207

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. Thanks for the quick merge! Output looks good. TASK DETAIL https://phabricator.wikimedia.org/T174031 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, ArielGlenn Cc: gerritbot, ArielGlenn, Fjalapeno, Aklapper

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. Ha, I made the patch without seeing your comment (but it's broken yet, give me a few minutes :-P) We can tighten things up in a future schema, and I think we should. But people have been using who knows what sort of clients with who knows what sort

[Wikidata-bugs] [Maniphest] [Commented On] T222985: Provide wikidata JSON dumps compressed with zstd

2019-07-03 Thread ArielGlenn
ArielGlenn added a comment. I don't want to replace existing compression formats; this would be in addition to what we have. I'll have to look at the graphs to see how we are as far as CPU usage goes. Let's just do the json dump for now, if we do this. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T222985: Provide wikidata JSON dumps compressed with zstd

2019-07-02 Thread ArielGlenn
ArielGlenn added subscribers: Smalyshev, hoo. ArielGlenn added a comment. @hoo, @Smalyshev, care to weigh in? TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, Smalyshev

[Wikidata-bugs] [Maniphest] [Closed] T226601: Wikidata JSON dump generation broken

2019-07-02 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. $ ls -lL /data/otherdumps/wikidata/20190701.json.gz -rw-r--r-- 1 dumpsgen dumpsgen 56026802223 Jul 2 21:55 /data/otherdumps/wikidata/20190701.json.gz Closing. TASK DETA

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. BTW I apologize for getting to this late, other testing and reviews got in the way. TASK DETAIL https://phabricator.wikimedia.org/T174031 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: daniel, ArielGlenn Cc

[Wikidata-bugs] [Maniphest] [Commented On] T226153: Remove BETA from Wikidata entities dump

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. My understanding was that we would rename them going forwards only. TASK DETAIL https://phabricator.wikimedia.org/T226153 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nichtich, hoo, Hydriz, Maxlath

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. Just like the uid issue, there will be clients that rely on this behavior. Let's not break it unintentionally. TASK DETAIL https://phabricator.wikimedia.org/T174031 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. Verified that with the patch we now see an empty comment tag where before there was none. Ran in deployment prep on snapshot01 as dumpsgen user: /usr/bin/php7.2 /srv/mediawiki/multiversion/MWScript.php dumpBackup.php --wiki=enwiki --full --stub --report=1

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. If we get through this run with happy json files, I'll close the task. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, TheDatum, Nico008

[Wikidata-bugs] [Maniphest] [Commented On] T174031: MCR: Include all slots in XML dumps

2019-07-02 Thread ArielGlenn
ArielGlenn added a comment. @daniel It looks to me (without testing it) that the changeset above, https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/464768/ will handle empty comment text differently than we currently do, writing out a comment tag, maybe with empty content. Current

[Wikidata-bugs] [Maniphest] [Commented On] T226153: Remove BETA from Wikidata entities dump

2019-07-01 Thread ArielGlenn
ArielGlenn added a comment. OK, let's go for July 15th then, again between runs. How does that sound? (But let's make sure that date is announced everywhere.) TASK DETAIL https://phabricator.wikimedia.org/T226153 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] [Commented On] T226153: Remove BETA from Wikidata entities dump

2019-07-01 Thread ArielGlenn
ArielGlenn added a comment. I thought we agreed above on a July 29 deadline? TASK DETAIL https://phabricator.wikimedia.org/T226153 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nichtich, hoo, Hydriz, Maxlath, Envlh

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-06-27 Thread ArielGlenn
ArielGlenn added a comment. I'll leave this open until next week's run completes properly. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: TheDatum, Nico008, Melderick

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-06-26 Thread ArielGlenn
ArielGlenn added a comment. They typically pull once a day or more frequently. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: TheDatum, Nico008, Melderick, Smalyshev

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-06-26 Thread ArielGlenn
ArielGlenn added a comment. I have moved the files wikidata-20190624-all.json.gz and wikidata-20190624-all.json.bz2 to filenames that end in .not. The 'latest' links for the json bz2 and gz files are now broken; this lets people know that the link s are missing instead of beguiling them

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-06-26 Thread ArielGlenn
ArielGlenn added a comment. I'd prefer to remove and wait for the new run, but I'd like @Smalyshev 's opinion on whether the dumps are most likely fixed, or not, since he was the one who handled the broken deployment at the time. TASK DETAIL https://phabricator.wikimedia.org/T226601

[Wikidata-bugs] [Maniphest] [Commented On] T226601: Wikidata JSON dump generation broken

2019-06-26 Thread ArielGlenn
ArielGlenn added a comment. Whichever folks prefer, just let me know. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Smalyshev, Lea_Lacroix_WMDE, ArielGlenn, Envlh

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T226601: Wikidata JSON dump generation broken

2019-06-26 Thread ArielGlenn
ArielGlenn added a subscriber: Smalyshev. ArielGlenn added a comment. Adding @Smalyshev for comments. TASK DETAIL https://phabricator.wikimedia.org/T226601 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Smalyshev, Lea_Lacroix_WMDE

<    1   2   3   4   5   6   >