On 4 sep. 2013, at 18:59, Brian Wolff <[email protected]> wrote:

> On 9/1/13, Jean-Frédéric <[email protected]> wrote:
> [..]
>> 
>>> The downside to this is in order to effectively get metadata out of
>>> commons given the current practises, one essentially has to screen
>>> scrape and do slightly ugly things
>>> 
>> 
>> This [1] looks quite acrobatic indeed. Can’t we make better use of the
>> machine-readable markings provided by templates?
>> <https://commons.wikimedia.org/wiki/Commons:Machine-readable_data>
>> 
>> [1] https://gerrit.wikimedia.org/r/#/c/80403/4/CommonsMetadata_body.php
>> 
> 
> It is using the machine readable data from that page. (Although its
> debatable how much "Look for a <td> with this id, and then look at the
> contents of the next sibling <td> you encounter is").

Almost all of that is templated, so of course we can choose to actually fix 
some of those templates if we really wanted to. Especially for the licenses, my 
intent was EXACTLY to feed a system like you are building right now, while at 
the same time making Magnus' StockPhoto gadget possible for the immediate 
future, so I love what you are doing here.

I have not had time to read your patches unfortunately, but can I suggest 
creating a separate table of licenses ? The licenses are very well suited as 
'managed' data units I think and would give you a lot of flexibility. You could 
have like:

id, abbreviation, short name, long name, license version, long description 
page, default template, scrapeid, canonical license URL, canonical RFDa, PD/CC, 
BY, NC, SA, other properties of the license requirements

Then use the 'scrapeid' to link the licenses to the file metadata. The licenses 
are very well suited for this I think and it will make it a lot easier to 
search trough the database and to dynamically give suitable representations of 
the license in different types (very short linked, long linked, full text, full 
linked) in different languages.

For the other metadata it would also be very nice to take a much more 
structured and even WikiData approach, but I think a licenses table is much 
simpler that most other metadata, would give us a lot of flexibility and 
advantadges and would be easy to import into WikiData once we think we are up 
to that. Something to consider.

DJ
_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Reply via email to