[Wikidata-bugs] [Maniphest] T281267: various weekly and daily dumps run from systemd timers are broken

2023-06-21 Thread ArielGlenn
ArielGlenn added a comment. @fgiunchedi I notice that in some cases phab tasks are autocreated when systemd units fail. Is that true for systemd jobs on snapshot hosts? Could we get tagged on those (Dumps-Generation) or could we get emails from those (ops-dumps@wm.o)? TASK DETAIL https

[Wikidata-bugs] [Maniphest] T68108: [Epic] Store media information for files on Wikimedia Commons as structured data

2023-06-21 Thread ArielGlenn
ArielGlenn closed subtask T226093: Capacity planning for Commons Structured Data as Resolved. TASK DETAIL https://phabricator.wikimedia.org/T68108 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mholloway, Ladsgroup, MarkTraceur, WMDE

[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data

2023-06-21 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. There's no point in having this open for a once a year check in, so I'll go ahead and close it. When capacity planning needs to be done for dbs in the regular course of things

[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data

2023-01-10 Thread ArielGlenn
ArielGlenn added a comment. In T226093#8512308 <https://phabricator.wikimedia.org/T226093#8512308>, @LSobanski wrote: > The task's original intent was to cover planning "over the next 3 years" starting in 2019. @ArielGlenn is the task still relevant, can it be closed

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-04-11 Thread ArielGlenn
ArielGlenn added a comment. In T138208#7844298 <https://phabricator.wikimedia.org/T138208#7844298>, @Ladsgroup wrote: > It's a bit hard to measure but it's probably fixed. That would be wonderful if true. Let's leave this open for a while yet just in case... TASK DETAI

[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-03-03 Thread ArielGlenn
ArielGlenn added a comment. Hey jsut a note that we saw another failure: Output of systemd timer for '/usr/local/bin/dumpwikibaserdf.sh -p wikidata -d truthy -f nt' SYSTEMDTIMER noreply@snapshot1008.eqiad.wmnet via wikimedia.org ERROR 2013 (HY000): Lost

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-02-22 Thread ArielGlenn
ArielGlenn added a comment. I am aware of and following this discussion but right now, my responsiveness on this task will be slow, most of my time needs to go to getting my teammate who will be dumps co-maintainer up to speed. Please bear with us. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-02-01 Thread ArielGlenn
ArielGlenn added a comment. Hm I wonder who we should add that would take on restarting these jobs if they deem it useful. Uh. Deferring for now since I have no bright ideas, and noting that here. Thanks again! TASK DETAIL https://phabricator.wikimedia.org/T300240 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-02-01 Thread ArielGlenn
ArielGlenn added a comment. Uh @dcausse Do you want to add someone to the ops-dumps alias so that you can be informed in these instances and perhaps schedule a restart of the job(s)? It would be easy enough. Sorry to ask after the task is closed! TASK DETAIL https

[Wikidata-bugs] [Maniphest] T300240: Missing Wikidata RDF (ttl and nt) dumps for 20220117

2022-02-01 Thread ArielGlenn
ArielGlenn added a comment. I saw an error from the cron job, it was sent to ops-dumps, which someone from WMDE should be on as well I think. The error looked to me like it had to do with a db server being depooled or otherwise unavailable: ERROR 2013 (HY000): Lost connection to MySQL

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-24 Thread ArielGlenn
ArielGlenn added a comment. Thanks. I was pretty careful with my testing for the last fix, making sure that in production the patch redirected to a vslow/dump server. But I may have overlooked something. :-( TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-24 Thread ArielGlenn
ArielGlenn added a comment. I hate to ask but can we capture any queries? TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: LSobanski, Ladsgroup, Marostegui, Addshore

[Wikidata-bugs] [Maniphest] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots

2022-01-24 Thread ArielGlenn
Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T238972 WORKBOARD https://phabricator.wikimedia.org/project/board/1519/ EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Christian75

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-24 Thread ArielGlenn
ArielGlenn added a comment. The above patch was deployed with the train everywhere, so the specific set of queries should no longer be directed to non-vslow/dump db servers. If that's the cas, we are now back to the harder issue of what to do when a db server is depooled, and I think

[Wikidata-bugs] [Maniphest] T297470: torrent file for Wikidata dumps

2022-01-17 Thread ArielGlenn
ArielGlenn closed this task as "Declined". ArielGlenn added a comment. I'm goin to go ahead and close this as declined. Feel to re-open if things change in the future. TASK DETAIL https://phabricator.wikimedia.org/T297470 EMAIL PREFERENCES https://phabricator.wikimedia.or

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-17 Thread ArielGlenn
ArielGlenn added a comment. The patch at https://gerrit.wikimedia.org/r/c/mediawiki/core/+/747455/ is tested and ready to go, and in line with the way existing dumps scripts work. So I'd like to go ahead with it. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-10 Thread ArielGlenn
ArielGlenn added a comment. There is a complicated set of python scripts that coordinate the dump jobs for each wiki during the two monthly runs. https://wikitech.wikimedia.org/wiki/Dumps/Current_Architecture gives an overview. https://www.mediawiki.org/wiki/SQL/XML_Dumps

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-10 Thread ArielGlenn
ArielGlenn added a comment. In T138208#7611718 <https://phabricator.wikimedia.org/T138208#7611718>, @Ladsgroup wrote: > In T138208#7611712 <https://phabricator.wikimedia.org/T138208#7611712>, @ArielGlenn wrote: > >> Not yet; I need to talk with someone mo

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2022-01-10 Thread ArielGlenn
ArielGlenn added a comment. In T138208#7611708 <https://phabricator.wikimedia.org/T138208#7611708>, @Marostegui wrote: > In T138208#7571559 <https://phabricator.wikimedia.org/T138208#7571559>, @gerritbot wrote: > >> Change 747455 had a related patch set

[Wikidata-bugs] [Maniphest] T222349: Do not rate limit dumps from internal network

2021-12-16 Thread ArielGlenn
ArielGlenn added a comment. Note that the checksum files for those dumps are available for download as well, since they are provided along with the main dump output files to all mirrors. Someone from WMCS will probably need to look at this (again) if the discussion is being re-opened

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-12-14 Thread ArielGlenn
ArielGlenn added a comment. Thanks for this thought, Daniel. I think it's better if I can pass the dbgroupdefault parameter to the maintenance script itself, instead of hacking something into getBlob(). But I do need to check if that's going to work ok. The longer term fix you mentioned

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-12-14 Thread ArielGlenn
ArielGlenn added a comment. As I feared, fetchText.php calls MediaWikiServices::getInstance()->getBlobStore()->getBlob() which gets a db replica connection on its own, with no opportunity for us to ask that it be in the vslow/dump group. We might be able to use the -dbgroupdefaul

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-12-13 Thread ArielGlenn
ArielGlenn added a comment. The above is happening from pages-meta-history dumps, and I will look into it later today. The snapshot1008 (wikidata entity) dumps will be harder. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-12-13 Thread ArielGlenn
ArielGlenn added a comment. The reason only those two snapshot hosts are involved is undoubtedly because dumps on the others have finished for this run. TASK DETAIL https://phabricator.wikimedia.org/T138208 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T297470: torrent file for Wikidata dumps

2021-12-11 Thread ArielGlenn
ArielGlenn added a comment. We don't provide torrent files from here because this is something that can be done by members of the community. I would get in touch with one of the people maintaining any of the torrents listed here: https://meta.wikimedia.org/wiki/Data_dump_torrents and see

[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-20 Thread ArielGlenn
ArielGlenn added a comment. In T222985#7164049 <https://phabricator.wikimedia.org/T222985#7164049>, @Mitar wrote: > Are you saying that existing wikidata json dumps can be decompressed in parallel if using lbzip2, but not pbzip2? lbzip2 is format-compatible with bzip2 and

[Wikidata-bugs] [Maniphest] T222985: Provide wikidata JSON dumps compressed with zstd

2021-06-20 Thread ArielGlenn
ArielGlenn added a comment. lbzip2 decompresses in parallel as well. We use that for compression of the SQL/XML dumps. TASK DETAIL https://phabricator.wikimedia.org/T222985 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mitar

[Wikidata-bugs] [Maniphest] T281267: various weekly and daily dumps run from systemd timers are broken

2021-05-05 Thread ArielGlenn
ArielGlenn added a comment. What are the next steps on this? Should I be tweaking a manifest someplace? TASK DETAIL https://phabricator.wikimedia.org/T281267 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: jbond, ArielGlenn Cc: Addshore

[Wikidata-bugs] [Maniphest] T209390: Output some meta data about the wikidata JSON dump

2021-04-28 Thread ArielGlenn
ArielGlenn added a subscriber: hoo. ArielGlenn added a comment. I am proactively adding @hoo as he can provide some insight and perhaps tag others as well. TASK DETAIL https://phabricator.wikimedia.org/T209390 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata

2021-04-07 Thread ArielGlenn
ArielGlenn added a comment. In T279518#6981710 <https://phabricator.wikimedia.org/T279518#6981710>, @hoo wrote: >> Icinga sends alerts, and those would come to me I guess, which is probably not the best outcome :-) > > We could use the `wikid

[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata

2021-04-07 Thread ArielGlenn
ArielGlenn added a comment. Icinga sends alerts, and those would come to me I guess, which is probably not the best outcome :-) I believe that we use MAILTO for everything in the dumpsgen crontab, but the question is whether there's a nice alias to send emails to, or whether we want

[Wikidata-bugs] [Maniphest] T279518: Enable automatic JSON dump validation for Wikidata

2021-04-07 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T279518 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lydia_Pintscher, ArielGlenn

[Wikidata-bugs] [Maniphest] T277300: Lexeme JSON dumps contain invalid JSON

2021-03-23 Thread ArielGlenn
ArielGlenn added a comment. This is now deployd and will be in effect for next week's lexeme run. TASK DETAIL https://phabricator.wikimedia.org/T277300 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, hoo

[Wikidata-bugs] [Maniphest] T278031: Wikibase canonical JSON format is missing "modified" in Wikidata JSON dumps

2021-03-21 Thread ArielGlenn
ArielGlenn added a project: Dumps-Generation. Restricted Application added a project: wdwb-tech. TASK DETAIL https://phabricator.wikimedia.org/T278031 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mitar, Aklapper, Invadibot

[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format

2021-03-16 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn added a comment. Since @hoo validated the dump from the past week, verifiying that the current dump generation process is fixed, we can now close this task. Thanks everyone! TASK DETAIL https://phabricator.wikimedia.org/T276

[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format

2021-03-07 Thread ArielGlenn
ArielGlenn added a comment. I'll leave this open until the run is complete and folks have had time to try to use them, so probably through the coming weekend. TASK DETAIL https://phabricator.wikimedia.org/T276643 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel

[Wikidata-bugs] [Maniphest] T276643: Wikidata JSON dump (bz2) no longer imports due to bad JSON format

2021-03-07 Thread ArielGlenn
ArielGlenn added a comment. In T276643#6890308 <https://phabricator.wikimedia.org/T276643#6890308>, @Ash20001 wrote: > Will this patch be included in the next dump or can be put back in the last two dumps (regenerate dump) This should be in time for the dump that will be

[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme

2021-02-18 Thread ArielGlenn
ArielGlenn added a comment. These look fine to me from today, and I've done all the buster-side testing so that's ok too. Closing this! Ah, do we want to anounce it anywhere though? Maybe I won't close it pending that answer. Places it could be announced: xmldatadumps-l, wikitech-l

[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme

2021-02-11 Thread ArielGlenn
ArielGlenn added a comment. I am doing some prep work before I try to test this on buster. Getting close! TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: hoo, ArielGlenn Cc: noarave

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-02-09 Thread ArielGlenn
ArielGlenn added a comment. mysql.php, used for wikidata entity dumps, does not apparently correctly handle the --group flag. it's unclear to me what it does do, I need to check into this sometime later. The queries run by it are extremely short so the impact is minimal, but it still needs

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-02-08 Thread ArielGlenn
ArielGlenn added a comment. In T138208#6811418 <https://phabricator.wikimedia.org/T138208#6811418>, @Addshore wrote: > In T138208#6809784 <https://phabricator.wikimedia.org/T138208#6809784>, @ArielGlenn wrote: > >> This is because the maintenance scripts tha

[Wikidata-bugs] [Maniphest] T147169: Make sure Wikibase dump maintenance scripts solely use the "dump" db group

2021-02-08 Thread ArielGlenn
ArielGlenn added a comment. These are for the weekly wikidata "entity dumps", and so separate from the main xml/sql dumps implicated in the other task. TASK DETAIL https://phabricator.wikimedia.org/T147169 EMAIL PREFERENCES https://phabricator.wikimedia.org/sett

[Wikidata-bugs] [Maniphest] T138208: Connections to all db servers for wikidata as wikiadmin from snapshot, terbium

2021-02-08 Thread ArielGlenn
ArielGlenn added a comment. This is because the maintenance scripts that do "small" page ranges take several hours to complete. I will keep this in mind for when we can go to multiple bz2 streams in the page content history dumps; I'll be able to dump much smaller ranges then

[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme

2021-02-01 Thread ArielGlenn
ArielGlenn added a comment. All set. We should check on these again in the middle of next week, as the run starts on Monday at ridiculous-o-clock when we are all sleeping. TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme

2021-01-29 Thread ArielGlenn
ArielGlenn added a comment. In T264883#6786811 <https://phabricator.wikimedia.org/T264883#6786811>, @Lucas_Werkmeister_WMDE wrote: > Are you sure they ran? That directory only contains RDF dumps as far as I can tell (Turtle and NTriples), we’ve been generating those fo

[Wikidata-bugs] [Maniphest] T264883: Prepare deployment of JSON dumps for Lexeme

2021-01-29 Thread ArielGlenn
ArielGlenn added a comment. These ran and are available at https://dumps.wikimedia.org/other/wikibase/wikidatawiki/20210122/ How do they look? TASK DETAIL https://phabricator.wikimedia.org/T264883 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T221504: investigate why content history dump of certain wikidata page ranges is so slow

2020-12-15 Thread ArielGlenn
ArielGlenn added a comment. Following up on this, has there been any more discussion about making the JSON a little less wordy/disk-filly? I don't see any other path forward on this in the short to medium term. TASK DETAIL https://phabricator.wikimedia.org/T221504 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T246415: Investigate a different db load groups for wikidata / wikibase

2020-11-04 Thread ArielGlenn
ArielGlenn added a project: User-ArielGlenn. TASK DETAIL https://phabricator.wikimedia.org/T246415 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: Michael, ArielGlenn Cc: ArielGlenn, Michael, Marostegui, Ladsgroup, WMDE-leszek, Aklapper, Addshore

[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed

2020-10-30 Thread ArielGlenn
ArielGlenn added a comment. All of those tables are there: see https://gerrit.wikimedia.org/r/c/operations/puppet/+/527505 and current https://github.com/wikimedia/puppet/blob/production/modules/snapshot/files/dumps/table_jobs.yaml#L142 Is there anything else needed

[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed

2020-10-30 Thread ArielGlenn
ArielGlenn added a comment. In T264298#6511634 <https://phabricator.wikimedia.org/T264298#6511634>, @Lucas_Werkmeister_WMDE wrote: > We also realized that the `tablejobs.yaml` file didn’t mention the new tables (the replacement for `wb_terms`: `wbt_{item,property}_terms`, `

[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding

2020-10-11 Thread ArielGlenn
ArielGlenn removed projects: Wikidata, Wikidata-Query-Service, Analytics. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc: Lucas_Werkmeister_WMDE, ArielGlenn, Milimetric

[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding

2020-10-08 Thread ArielGlenn
ArielGlenn added a comment. In T264850#6531377 <https://phabricator.wikimedia.org/T264850#6531377>, @Milimetric wrote: > @ArielGlenn is this something you'd know about or know who to point me to? I think the wdqs folks are going to be your best bet, I've added the projec

[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding

2020-10-08 Thread ArielGlenn
ArielGlenn added a comment. echo -n ânești | od -t x1 000 c3 a2 6e 65 c8 99 74 69 You appear to be seeing a string representation of the non-ascii characters as hex bytes, i.e. xc3 xa2 ne xc8 x99 ti. What command are you using to display the test in the file, and on what

[Wikidata-bugs] [Maniphest] T264850: Categorylinks dump might have some problem with the encoding

2020-10-08 Thread ArielGlenn
ArielGlenn added projects: Wikidata-Query-Service, Dumps-Generation. Restricted Application added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T264850 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: JAllemandou, ArielGlenn Cc

[Wikidata-bugs] [Maniphest] T264164: Cleanup broken dumps in /wikidatawiki/entities/20200921/

2020-10-02 Thread ArielGlenn
ArielGlenn added a comment. They are indeed gone from dumpsdata1002; we keep fewer back issues there, since we're not serving them anywhere but only rsyncing them off. We keep the last 3 wikibase dumps, see https://github.com/wikimedia/puppet/blob/production/modules/dumps/manifests/web

[Wikidata-bugs] [Maniphest] T264298: wb_terms is getting removed

2020-10-02 Thread ArielGlenn
ArielGlenn added a comment. No impact. Only tables actually in the database are dumped, a check of each table in the list is done beforehand. The code can be cleaned up anyways just to be nice though. TASK DETAIL https://phabricator.wikimedia.org/T264298 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T264164: Cleanup broken dumps in /wikidatawiki/entities/20200921/

2020-10-01 Thread ArielGlenn
ArielGlenn added subscribers: Gehel, ArielGlenn. ArielGlenn added a comment. @Gehel was just asking about these yesterday and whether he should clean them up. The procedure is: delete first from the appropriate dumpsdata host (dumpsdata1002) where they are first written. Then delete them

[Wikidata-bugs] [Maniphest] T220883: Wikidata JSON dumps should include Lexemes

2020-09-30 Thread ArielGlenn
ArielGlenn added a comment. I renew my question above in T220883#5185999 <https://phabricator.wikimedia.org/T220883#5185999> and if someone can answer this, I can work with them to make these go live. TASK DETAIL https://phabricator.wikimedia.org/T220883 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-09-22 Thread ArielGlenn
ArielGlenn closed this task as "Resolved". ArielGlenn claimed this task. ArielGlenn added a comment. Re-enabled, checked daily runs, they look good, so I'm resolving this. Thanks, everybody! TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENC

[Wikidata-bugs] [Maniphest] T226093: Capacity planning for Commons Structured Data

2020-09-16 Thread ArielGlenn
ArielGlenn added a comment. Updated (ouch!) F32352585: commons_slots.png <https://phabricator.wikimedia.org/F32352585> TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lad

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-09-13 Thread ArielGlenn
ArielGlenn added a comment. In T260232#6448382 <https://phabricator.wikimedia.org/T260232#6448382>, @gerritbot wrote: > Change 625642 **merged** by jenkins-bot: > [mediawiki/core@master] don't pass null page id to page related queries for category change rdf dumps

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-09-07 Thread ArielGlenn
ArielGlenn added a comment. In T260232#6390706 <https://phabricator.wikimedia.org/T260232#6390706>, @gerritbot wrote: > Change 620775 had a related patch set uploaded (by ArielGlenn; owner: ArielGlenn): > [mediawiki/core@master] don't include null page ids in query list f

[Wikidata-bugs] [Maniphest] T262187: Wikidata entity dumps didn't start this week

2020-09-07 Thread ArielGlenn
ArielGlenn created this task. ArielGlenn added projects: Wikidata, Dumps-Generation. TASK DESCRIPTION This change: P12492 <https://phabricator.wikimedia.org/P12492> left the dump db group empty, and so any attempts to run wikidata entity dumps failed. The host was added back in

[Wikidata-bugs] [Maniphest] T261204: Wikidata lexeme ttl dumps should be in a "predictable" folder

2020-09-01 Thread ArielGlenn
ArielGlenn added a comment. I think we can just move this through and keep our eyes on it. TASK DETAIL https://phabricator.wikimedia.org/T261204 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, dcausse, Alter-paule

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-17 Thread ArielGlenn
ArielGlenn added a comment. I took to brute force approach of writing all queries to a log file by adding the appropriate fopen/fputs/fclose in Database::select (live on snapshot1010, testbed host). I then ran: dumpsgen@snapshot1010:/srv/mediawiki$ /usr/bin/php7.2 /srv/mediawiki

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-14 Thread ArielGlenn
ArielGlenn added a comment. Just for completeness, on db2073 I also I ran the original query with the crap entry, the show explain showed use of a filesort as above, and the execution time was... well it's still going, 330 seconds in. I killed it. TASK DETAIL https

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-14 Thread ArielGlenn
ArielGlenn added a comment. I saw multiple queries with this string in them while camping on the production vslow and looking at the processlist. I don't know how many of the queries have this issue. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-14 Thread ArielGlenn
ArielGlenn added a comment. When I ran the above query on db2073 (codfw dups and vslow host) without the crap ' ' field in there, it returned in 0.00 seconds. Maybe the bad entries are a new development? TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-14 Thread ArielGlenn
ArielGlenn added a comment. SELECT /* BatchRowIterator::next */ cl_from,cl_to FROM `categorylinks` WHERE cl_type = 'subcat' AND cl_from

[Wikidata-bugs] [Maniphest] T260232: BatchRowIterator slow query on commonswiki

2020-08-14 Thread ArielGlenn
ArielGlenn added a comment. Daily rdf dumps are probably broken until this is resolved, just a fyi for folks importing these for search purposes. TASK DETAIL https://phabricator.wikimedia.org/T260232 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences

[Wikidata-bugs] [Maniphest] T257876: redirected Q & deleted P Not Consistant in the json dump and web front end

2020-07-15 Thread ArielGlenn
ArielGlenn added a project: Wikidata. TASK DETAIL https://phabricator.wikimedia.org/T257876 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Alicezou26, jannee_e, Akuckartz, darthmon_wmde, Nandana, Jony, Lahi, Gq86, NoohNaeem

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-07-09 Thread ArielGlenn
ArielGlenn added a comment. Links latest-full.ttl.bz2 -> 20200116/commons-20200116-full.ttl.bz2 and latest-full.ttl.gz -> 20200116/commons-20200116-full.ttl.gz have been cleaned up. Thanks for the suggestion! TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-07-09 Thread ArielGlenn
ArielGlenn added a comment. It's linked off the 'other datasets' page near the top. But here's the direct link: https://dumps.wikimedia.org/other/wikibase/commonswiki/ TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T226093: Capacity planning for Commons Structured Data

2020-07-07 Thread ArielGlenn
ArielGlenn added a comment. Updated.F31919691: commons_slots_new.png <https://phabricator.wikimedia.org/F31919691> TASK DETAIL https://phabricator.wikimedia.org/T226093 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Lad

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-05-27 Thread ArielGlenn
ArielGlenn added a comment. @dcausse what's your time frame? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel

[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run

2020-05-27 Thread ArielGlenn
ArielGlenn added a comment. Unless folks want to keep it open to work on speeding it up in the future? TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: SilentSpike, WMDE-leszek

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-05-16 Thread ArielGlenn
ArielGlenn added a comment. I see that we're no longer blocked. Does this mean that we're good to go for weekly runs? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc

[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run

2020-05-13 Thread ArielGlenn
ArielGlenn added a comment. In T238199#6135018 <https://phabricator.wikimedia.org/T238199#6135018>, @Ladsgroup wrote: > ... > Anyway, Lydia said it's fine to do it tomorrow when it gets announced by our communication manager. Does that work for you? Any

[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run

2020-05-13 Thread ArielGlenn
ArielGlenn added a comment. Can we do this temporarily while the query is being fixed up? It looks like it had to be killed in Nov, Feb, Apr, May, so I'd rather temp disable than require folks to shoot it (and anything else hung as a side effect). TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T238199: SpecialFewestRevisions::reallyDoQuery takes more than 9h to run

2020-05-13 Thread ArielGlenn
ArielGlenn added a comment. Can we just skip the updateSpecialPages.php wikidatawiki --override --only=Fewestrevisions script altogether, instead of shooting it every month? TASK DETAIL https://phabricator.wikimedia.org/T238199 EMAIL PREFERENCES https://phabricator.wikimedia.org

[Wikidata-bugs] [Maniphest] [Commented On] T252632: Restart wikidata entity dumps

2020-05-13 Thread ArielGlenn
ArielGlenn added a comment. As I understand it the long running query comes from a monthly cron job. TASK DETAIL https://phabricator.wikimedia.org/T252632 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: hoo, ArielGlenn, jannee_e

[Wikidata-bugs] [Maniphest] [Created] T252632: Restart wikidata entity dumps

2020-05-13 Thread ArielGlenn
ArielGlenn created this task. ArielGlenn added projects: Dumps-Generation, Wikidata. TASK DESCRIPTION The weekly run was shot this morning when vslow db connections stalled due to an unrelated long-running query, see T238199 <https://phabricator.wikimedia.org/T238199> It can be res

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-04-21 Thread ArielGlenn
ArielGlenn added a comment. Hi, just checking in: any progress on invetigating the 'extra' dumps content? TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256

[Wikidata-bugs] [Maniphest] [Updated] T248857: Wikdata entities dump not generated

2020-03-30 Thread ArielGlenn
ArielGlenn added subscribers: hoo, ArielGlenn. ArielGlenn added a comment. See T248612 <https://phabricator.wikimedia.org/T248612> for that, I believe @hoo is planning to deploy and restart the week's run today. TASK DETAIL https://phabricator.wikimedia.org/T248857 EMAIL PREFE

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-02-12 Thread ArielGlenn
ArielGlenn added a comment. @Cparle, No blocks on your side, the ball is now in @dcausse 's court. :-) TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: nettrom_WMF, Mahir256

[Wikidata-bugs] [Maniphest] [Commented On] T238972: switch xml/sql (and adds-changes) dumps to use 0.11 schema with content from multiple slots

2020-02-11 Thread ArielGlenn
ArielGlenn added a comment. This is pending https://gerrit.wikimedia.org/r/#/c/mediawiki/core/+/556346/ and related patches, so we're looking at March 1 if all goes well. TASK DETAIL https://phabricator.wikimedia.org/T238972 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings

[Wikidata-bugs] [Maniphest] [Commented On] T243701: Wikidata maxlag repeatedly over 5s since Jan20, 2020 (primarily caused by the query service)

2020-02-06 Thread ArielGlenn
ArielGlenn added a comment. In T243701#5855352 <https://phabricator.wikimedia.org/T243701#5855352>, @Lea_Lacroix_WMDE wrote: > Over the past weeks, we noticed a huge increase of content in Wikidata. Maybe that's something worth looking at? Wikidata content is growing

[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2020-01-22 Thread ArielGlenn
ArielGlenn added a comment. Some unexpected (?) triples popping up that @dcausse is looking into, so the dumps will not be turned on in cron until we have the thumbs up on that. See T243292 <https://phabricator.wikimedia.org/T243292> If it turns out the data is all ok, we ca

[Wikidata-bugs] [Maniphest] [Updated] T243292: Fix the munger to support commons RDF dump

2020-01-22 Thread ArielGlenn
ArielGlenn added a parent task: T221917: Create RDF dump of structured data on Commons. TASK DETAIL https://phabricator.wikimedia.org/T243292 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: ArielGlenn, Aklapper, dcausse, darthmon_wmde

[Wikidata-bugs] [Maniphest] [Updated] T221917: Create RDF dump of structured data on Commons

2020-01-22 Thread ArielGlenn
ArielGlenn added a subtask: T243292: Fix the munger to support commons RDF dump. TASK DETAIL https://phabricator.wikimedia.org/T221917 EMAIL PREFERENCES https://phabricator.wikimedia.org/settings/panel/emailpreferences/ To: ArielGlenn Cc: Mahir256, dcausse, EBernhardson, Cparle, Abit, Gehel

[Wikidata-bugs] [Maniphest] [Changed Subscribers] T221917: Create RDF dump of structured data on Commons

2020-01-16 Thread ArielGlenn
ArielGlenn added a subscriber: dcausse. ArielGlenn added a comment. @dcausse is going to check over the ttl dump and let me know if it looks ok; if so then I'll flip the switch for generation weekly and make sure there's cleanup too. TASK DETAIL https://phabricator.wikimedia.org/T221917

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-16 Thread ArielGlenn
ArielGlenn added a comment. In https://dumps.wikimedia.org/other/wikibase/commonswiki/ there are two ttl files, gz and bz2 compressed. Please have a look! The bash script producing them complained that /usr/local/bin/dumpwikibaserdf.sh: line 224: setDcatConfig: command not found

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-16 Thread ArielGlenn
ArielGlenn added a comment. I found a ticket that mentions use of ttl files so I'll run /usr/local/bin/dumpwikibaserdf.sh commons full ttl and keep an eye on it. Running on snapshot1008 in a screen session. Here we go! TASK DETAIL https://phabricator.wikimedia.org/T221917

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-13 Thread ArielGlenn
ArielGlenn added a comment. I plan to try running /usr/local/bin/dumpwikibaserdf.sh commons full nt on Thursday morning and see how long it takes with the 8 shards that are currently configured. @Abit is the nt format the one needed for WDQS testing? TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-13 Thread ArielGlenn
ArielGlenn added a comment. Ran php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 500 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing --first-page-id

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-13 Thread ArielGlenn
ArielGlenn added a comment. Ran php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 1000 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore-missing --first-page-id

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-13 Thread ArielGlenn
ArielGlenn added a comment. Note to self that a run of php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size 250 --format nt --flavor full-dump --entity-type mediainfo --no-cache --dbgroupdefault dump --ignore

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-13 Thread ArielGlenn
ArielGlenn added a comment. This morning the job was terminated by the oom killer: [4288057.417443] Out of memory: Kill process 117265 (php) score 868 or sacrifice child [4288057.425084] Killed process 117265 (php) total-vm:58241128kB, anon-rss:56901636kB, file-rss:0kB, shmem-rss

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-10 Thread ArielGlenn
ArielGlenn added a comment. A batchsize of 50k turned out to be too large. Same with 5k. I'm now running with a batchsize of 500, which will surely be too small, but at least I am getting output. I'll check on it tomorrow and see how it's doing. TASK DETAIL https

[Wikidata-bugs] [Maniphest] [Commented On] T221917: Create RDF dump of structured data on Commons

2020-01-10 Thread ArielGlenn
ArielGlenn added a comment. Because I've gotten a nice run in beta with the --ignore-missing flag, I'm trying a test run on snapshot1008 in a screen session: php /srv/mediawiki/multiversion/MWScript.php extensions/Wikibase/repo/maintenance/dumpRdf.php --wiki commonswiki --batch-size

  1   2   3   4   5   6   >