Denny Vrandečić, 24/04/2013 17:05:
There is also no need to log that an edit, which is marked as
autopatrolled in the edit table,

There is no such a thing. :)
However, we already have silent (unlogged) patrolling by rollback.

Nemo

has been patrolled.

For edits, which are not autopatrolled, it makes sense to log by whom
and when it was patrolled, but for an autopatrolled edit that is kinda
useless.

Getting rid of this would already eliminate the vast majority of log
entries.





2013/4/23 Federico Leva (Nemo) <[email protected]
<mailto:[email protected]>>

    Sven, 23/04/2013 02:02:

        Please pardon the non-tech person, as I may be asking a question
        with obvious answers, but what, exactly, is the problem here?
        Storage space is cheap and logs are text, which takes up very
        little space...


    I suppose it's related to what below.

    Nemo

    -------- Messaggio originale --------
    Oggetto: [Xmldatadumps-l] wikidatawiki -- toooo many edits
    Data: Tue, 23 Apr 2013 19:31:00 +0300
    Mittente: Ariel T. Glenn
    Organizzazione: Wikimedia Foundation
    A: Wikipedia Xmldatadumps-l

    Hello dumps users and developers,

    You may have noticed that the wikidata pages-logging xml dump step has
    taken days for the last couple of runs.  In fact for the most recent
    run, it did not complete properly, as the database handling the query
    was upgraded in the middle to mariadb.

    So the short version is, if you are using that file, go get a new copy:
    
http://dumps.wikimedia.org/__wikidatawiki/20130417/__wikidatawiki-20130417-pages-__logging.xml.gz
    
<http://dumps.wikimedia.org/wikidatawiki/20130417/wikidatawiki-20130417-pages-logging.xml.gz>

    If I don't have a patch in by next run, I have a workaround I will run
    by hand that takes 2 hours or less, as opposed to 4 days.



    The long version is that the pages-logging file is already about half
    the size of en wp's table, and that the number of edits per minute is
    much larger, see:
    https://wikipulse.herokuapp.__com/ <https://wikipulse.herokuapp.com/>
    There's a lot of deletion and a lot of churn too due to the dispatch
    mechanism.
    Also, they apparently have RCPatrol enabled and a pile of bots, which
    means that the log consists of 99% entries 'bot X editing Y marked it as
    autopatrolled'.
    These things in combo turn out to be the perfect storm for my simple
    select query, causing it to start at normal speed and then get ever
    slower.  I suppose in another couple months it would take so long to run
    it would never finish...

    Ariel


    _________________________________________________
    Xmldatadumps-l mailing list
    Xmldatadumps-l@lists.__wikimedia.org
    <mailto:[email protected]>
    https://lists.wikimedia.org/__mailman/listinfo/xmldatadumps-__l
    <https://lists.wikimedia.org/mailman/listinfo/xmldatadumps-l>


    _________________________________________________
    Wikidata-l mailing list
    [email protected] <mailto:[email protected]>
    https://lists.wikimedia.org/__mailman/listinfo/wikidata-l
    <https://lists.wikimedia.org/mailman/listinfo/wikidata-l>




--
Project director Wikidata
Wikimedia Deutschland e.V. | Obentrautstr. 72 | 10963 Berlin
Tel. +49-30-219 158 26-0 | http://wikimedia.de

Wikimedia Deutschland - Gesellschaft zur Förderung Freien Wissens e.V.
Eingetragen im Vereinsregister des Amtsgerichts Berlin-Charlottenburg
unter der Nummer 23855 B. Als gemeinnützig anerkannt durch das Finanzamt
für Körperschaften I Berlin, Steuernummer 27/681/51985.


_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l


_______________________________________________
Wikidata-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikidata-l

Reply via email to