Sven, 23/04/2013 02:02:
Please pardon the non-tech person, as I may be asking a question with obvious answers, but what, exactly, is the problem here? Storage space is cheap and logs are text, which takes up very little space...
I suppose it's related to what below. Nemo -------- Messaggio originale -------- Oggetto: [Xmldatadumps-l] wikidatawiki -- toooo many edits Data: Tue, 23 Apr 2013 19:31:00 +0300 Mittente: Ariel T. Glenn Organizzazione: Wikimedia Foundation A: Wikipedia Xmldatadumps-l Hello dumps users and developers, You may have noticed that the wikidata pages-logging xml dump step has taken days for the last couple of runs. In fact for the most recent run, it did not complete properly, as the database handling the query was upgraded in the middle to mariadb. So the short version is, if you are using that file, go get a new copy: http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-logging.xml.gz If I don't have a patch in by next run, I have a workaround I will run by hand that takes 2 hours or less, as opposed to 4 days. The long version is that the pages-logging file is already about half the size of en wp's table, and that the number of edits per minute is much larger, see: https://wikipulse.herokuapp.com/ There's a lot of deletion and a lot of churn too due to the dispatch mechanism. Also, they apparently have RCPatrol enabled and a pile of bots, which means that the log consists of 99% entries 'bot X editing Y marked it as autopatrolled'. These things in combo turn out to be the perfect storm for my simple select query, causing it to start at normal speed and then get ever slower. I suppose in another couple months it would take so long to run it would never finish... Ariel _______________________________________________ Xmldatadumps-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l _______________________________________________ Wikidata-l mailing list [email protected] https://lists.wikimedia.org/mailman/listinfo/wikidata-l
