On 9/1/13, Jean-Frédéric <jeanfrederic.w...@gmail.com> wrote:
[..]
>
>> The downside to this is in order to effectively get metadata out of
>> commons given the current practises, one essentially has to screen
>> scrape and do slightly ugly things
>>
>
> This [1] looks quite acrobatic indeed. Can’t we make better use of the
> machine-readable markings provided by templates?
> <https://commons.wikimedia.org/wiki/Commons:Machine-readable_data>
>
> [1] https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
>

It is using the machine readable data from that page. (Although its
debatable how much "Look for a <td> with this id, and then look at the
contents of the next sibling <td> you encounter is").

I'm somewhat of a newb though with extracting microformat style
metadata, so its quite possible there is a better way, or some higher
level parsing library I could use (Something like xpath maybe,
although its not really xml I'm looking at).

-- 
-bawolff

_______________________________________________
Wikitech-l mailing list
Wikitech-l@lists.wikimedia.org
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to