Re: [Wikitech-l] RDFa and Microdata in MediaWiki

Happy-melon Mon, 18 Jan 2010 16:35:35 -0800

"Aryeh Gregor" <[email protected]> wrote in message 
news:[email protected]...
> On Mon, Jan 18, 2010 at 6:41 PM, Happy-melon <[email protected]> wrote:
>> Eh?  I get the feeling that we're reading from totally different song 
>> sheets
>> here.  You seem to be saying here is that you expect the use case to be
>> 'license templates on steroids': on the image description page, we have
>> license templates that now emit
>> microdata/RDF/the-metadata-format-of-the-month, which can be picked up by
>> whoever is interested.
>
> Right.  We know that web spiders are interested in picking up this
> metadata automatically.
>
>> That's not MediaWiki doing anything active with the
>> data, and it's absolutely no different from marking up infoboxes.  In 
>> fact,
>> the usecase for infoboxes is arguably stronger, because their data 
>> structure
>> is more complicated and harder to machine-read otherwise.
>
> I'm not clear what your analogy to infoboxes is about.


I was saying that license templates are significantly easier to machine-read 
than infoboxes, because their data is simpler.  The ultimate goal is, as you 
say, to allow machine reading without bespoke parsing, but that's a long way 
down the line.

>
>> What I had assumed we meant by "MediaWiki do stuff with metadata" would 
>> be
>> to pick up metadata about an image, and then output that **wherever the
>> image is used**.  So when you view an article with an image, that use of 
>> the
>> image has a metadata cloud that describes where the image is from, what 
>> its
>> license is, whatever.
>
> Ah, I see.  I don't think we want to do that.  There's no end to the
> amount of metadata you could shove into a page in machine-readable
> format -- we'd be talking serious markup bloat here if you start
> adding things on the basis of "someone will surely find it useful".  I
> wouldn't want to add any extra output on every page unless we had a
> known, concrete use for it.

At least we now *know* we're talking about different things :-D  I agree 
there are gradations of what is 'worth' putting into the markup; although 
""adding things on the basis of 'someone will surely find it useful'"" is 
**exactly** what we will get if we allow the busy bee template developers 
access to a metadata markup, almost by definition.  I would say it's 
definitely 'worth' exposing license metadata on every use of an image; the 
status of a page's images affects our whole terms of use, whether we can say 
"yes you can use all this in this fashion" verses "you have to jump through 
these hoops for these images because they're different".  Author, location, 
capture date; yes these probably aren't 'worth' the cost of exposing on 
pages.  But being able to search commons for all photos taken in Berlin 
between 1989 and 1991 would be worth its weight in gold.

>
>> That usecase is incredibly badly served by just allowing raw metadata in 
>> the
>> image page wikitext; it's really no different to adding categories via a
>> license template.
>
> It's no different, except that RDFa/microdata are relatively standard,
> so third parties don't have to special-case MediaWiki and can use the
> same code to figure out licenses on all sites.  That's the only
> advantage.
...
> Well, the idea is you could accept microdata as input, and transform
> it into a different format for output if in the future you decided you
> didn't like microdata.  So you could add the disjointness between
> input and output at a later date if it's needed then.

Indeed, but that's data *output*, not input.  Currently our categories are 
input via [[Category:Foo]] and output via some HTML at the bottom of the 
page, but also via the API in a variety of formats; people use both methods 
to extract the metadata.  Once MW knows what data an object has, how it 
outputs that data back is totally open as you say.  So given that a 
translation into a format that MW understands is desirable for its own sake, 
and that from there it's trivial to translate back into whatever output 
format(s) the current web demands, why would we choose an input format like

<span xmlns:dc="http://purl.org/dc/elements/1.1/";
href="http://purl.org/dc/dcmitype/StillImage"; property="dc:title"
rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span xmlns:cc="http://creativecommons.org/ns#"; href="#mw-image"
property="cc:attributionName" rel="cc:attributionURL">Bob Smith</span>
is licensed under a <a rel="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/";>Creative
Commons Attribution-Share Alike 3.0 United States License</a>

Rather than an input format like [[License::CC-BY-SA-3.0]]??

--HM

 



_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RDFa and Microdata in MediaWiki

Reply via email to