Re: [Wikitech-ambassadors] File metadata cleanup drive: We now have numbers for Commons

Ricordisamoa Thu, 11 Dec 2014 16:42:34 -0800

Il 11/12/2014 23:28, Dan Garry ha scritto:

THIS IS AWESOME
Do you know when we are going to be able to start querying this via anAPI in production?
The Mobile Apps Team would love to consume this data, as opposed tothe present data exposed via the CommonsMetadata API (which isscraped, eugh).

As far as I understand the information Guillaume is talking about isexactly the one scraped by CommonsMetadata.

See https://tools.wmflabs.org/mrmetadata/how_it_works.html:

«The script needs to go through all file description pages of a wiki,and check for machine-readable metadata by querying the CommonsMetadataextension.»

Dan

On 11 December 2014 at 11:16, Guillaume Paumier<[email protected] <mailto:[email protected]>> wrote:


    Greetings,

    As many of you are aware, we're currently in the process of
    collectively adding machine-readable metadata to many files and
    templates that don't have them, both on Commons and on all other
    Wikimedia wikis with local uploads [1,2]. This makes it much easier to
    see and re-use multimedia files consistently with best practices for
    attribution across a variety of channels (offline, PDF exports, mobile
    platforms, MediaViewer, WikiWand, etc.)

    In October, I created a dashboard to track how many files were missing
    the machine-readable markers on each wiki [3]. Unfortunately, due to
    the size of Commons, I needed to find another way to count them there.

    Yesterday, I finished to implement the script for Commons, and started
    to run it. As of today, we have accurate numbers for the quantity of
    files missing machine-readable metadata on Commons: ~533,000, out of
    ~24 million [4]. It may seem like a lot, but I personally think it's a
    great testament to the dedication of the Commons community.

    Now that we have numbers, we can work on going through those files and
    fixing them. Many of them are missing the {{information}} template,
    but many of those are also part of a batch: either they were uploaded
    by the same user, or they were mass-uploaded by a bot. In either case,
    this makes it easier to parse the information and add the
    {{information}} template automatically with a bot, thus avoiding
    painful manual work.

    I invite you to take a look at the list of files at
    https://tools.wmflabs.org/mrmetadata/commons/commons/index.html and
    see if you can find such groups and patterns.

    Once you identify a pattern, you're encouraged to add a section to the
    Bot Requests page on Commons, so that a bot owner can fix them:
    
https://commons.wikimedia.org/wiki/Commons:Bots/Work_requests#Adding_the_Information_template_to_files_that_don.27t_have_it

    I believe we can make a lot of progress rapidly if we dive into the
    list of files and fix all the groups we can find. The list and
    statistics will be updated daily so it'll be easy to see our progress.

    Let me know if you'd like to help but are unsure how!

    [1] https://meta.wikimedia.org/wiki/File_metadata_cleanup_drive
    [2]
    
https://blog.wikimedia.org/2014/11/07/cleaning-up-file-metadata-for-humans-and-robots/
    [3] https://tools.wmflabs.org/mrmetadata/
    [4] https://tools.wmflabs.org/mrmetadata/commons/commons/index.html

    --
    Guillaume Paumier

    _______________________________________________
    Wikitech-ambassadors mailing list
    [email protected]
    <mailto:[email protected]>
    https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors



--
Dan Garry
Associate Product Manager, Mobile Apps
Wikimedia Foundation


_______________________________________________
Wikitech-ambassadors mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors

_______________________________________________
Wikitech-ambassadors mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-ambassadors

Re: [Wikitech-ambassadors] File metadata cleanup drive: We now have numbers for Commons

Reply via email to