Re: [Wikitech-l] RDFa and Microdata in MediaWiki

Aryeh Gregor Wed, 20 Jan 2010 06:39:31 -0800

On Mon, Jan 18, 2010 at 7:34 PM, Happy-melon <[email protected]> wrote:
> I was saying that license templates are significantly easier to machine-read
> than infoboxes, because their data is simpler.  The ultimate goal is, as you
> say, to allow machine reading without bespoke parsing, but that's a long way
> down the line.

No it's not.  Google already does it for RDFa and microformats.  Any
major user of microdata would encourage them to support that too
(especially since they invented it).  Multiple browsers have also
announced interest in supporting microdata.

> At least we now *know* we're talking about different things :-D

Yep.  :P

> I agree
> there are gradations of what is 'worth' putting into the markup; although
> ""adding things on the basis of 'someone will surely find it useful'"" is
> **exactly** what we will get if we allow the busy bee template developers
> access to a metadata markup, almost by definition.

I bet very few people would bother adding metadata without a concrete
use.  And they'd probably get into fights with other people annoyed at
them for making it harder to edit wikitext.  This would all be
irrelevant if we only supported a few whitelisted vocabularies,
though, as the current microdata implementation does.  We should
encourage bulky and not-so-useful stuff to go in a separate stream.

> I would say it's
> definitely 'worth' exposing license metadata on every use of an image; the
> status of a page's images affects our whole terms of use, whether we can say
> "yes you can use all this in this fashion" verses "you have to jump through
> these hoops for these images because they're different".  Author, location,
> capture date; yes these probably aren't 'worth' the cost of exposing on
> pages.  But being able to search commons for all photos taken in Berlin
> between 1989 and 1991 would be worth its weight in gold.

Sure -- but that can be exposed in a separate data stream, since
>99.9% of page views won't need it.

> Indeed, but that's data *output*, not input.  Currently our categories are
> input via [[Category:Foo]] and output via some HTML at the bottom of the
> page, but also via the API in a variety of formats; people use both methods
> to extract the metadata.  Once MW knows what data an object has, how it
> outputs that data back is totally open as you say.  So given that a
> translation into a format that MW understands is desirable for its own sake,
> and that from there it's trivial to translate back into whatever output
> format(s) the current web demands, why would we choose an input format like
>
> <span xmlns:dc="http://purl.org/dc/elements/1.1/";
> href="http://purl.org/dc/dcmitype/StillImage"; property="dc:title"
> rel="dc:type">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
> by <span xmlns:cc="http://creativecommons.org/ns#"; href="#mw-image"
> property="cc:attributionName" rel="cc:attributionURL">Bob Smith</span>
> is licensed under a <a rel="license"
> href="http://creativecommons.org/licenses/by-sa/3.0/us/";>Creative
> Commons Attribution-Share Alike 3.0 United States License</a>
>
> Rather than an input format like [[License::CC-BY-SA-3.0]]??

First, why are you asking me why we would choose RDFa when I don't
think we should?  At least quote microdata.

Second, this is apples to oranges.  Your RDFa sample a) says that the
work is a still image, b) gives its name, c) gives the author's name,
d) gives the URL of the license, e) contains user-visible prose.  Your
wikitext sample just gives the license name (not even a license URL!).
 No kidding the latter is shorter.  A more realistic comparison might
be

<p><span 
itemprop="title">EmeryMolyneux-terrestrialglobe-1592-20061127.jpg</span>
by <span itemprop="author">Bob Smith</span> is licensed under a <a
itemprop="license"
href="http://creativecommons.org/licenses/by-sa/3.0/us/";>Creative
Commons Attribution-Share Alike 3.0 United States License</a>.</p>

vs.

<p>[[title::EmeryMolyneux-terrestrialglobe-1592-20061127.jpg|]]
by [[author::Bob Smith|]] is licensed under a
[[license::http://creativecommons.org/licenses/by-sa/3.0/us/|[http://creativecommons.org/licenses/by-sa/3.0/us/
Creative
Commons Attribution-Share Alike 3.0 United States License]]].</p>

or something, which is not such an easy call.  The wikitext is not
that much shorter or simpler -- particularly when you account for the
fact that you'd have to separately define mappings to concrete
microdata/RDFa/RDF vocabularies for output.  (Yes, I left out the
itemtype on the microdata, but again, that would have to be defined
somewhere for the wikisyntax too.)

On Mon, Jan 18, 2010 at 7:47 PM, Manu Sporny <[email protected]> wrote:
> Looks like I've had my hand slapped twice during this discussions. I
> thought this was the first warning, but David seems to think
> differently. That means that either I've been too aggressive or David is
> not familiar with the level of intensity surrounding the Microdata/RDFa
> debates.

That veiled insults and questioning others' motives is par for the
course on public-html doesn't mean we're going to tolerate it here.
It shouldn't happen there either, of course, but we can't help that.

> I strongly disagree with the idea of getting
> Microdata integrated with Wikipedia at this stage, before REC

This is just not a reasonable position to take outside the ivory tower
of standards-making.  We are not going to deny our users useful
features just because some spec somewhere that happens to describe the
feature is not absolutely 100% fully finished.  We use zillions of
features that aren't in any spec at all, or are only in Working Draft,
as do all authors.  Do you really think we shouldn't be using CSS3
Selectors or CSS2.1 until they're REC?  Should we only use a Java
video player even when multiple browsers support a much better *and*
more standards-compliant experience via <video>, just because HTML5 is
still a WD?

This is just not tenable.  We use features when they're useful, not
when someone else thinks we should use them.  Our goal is to serve our
users, not spec writers.  Users above authors above implementers above
specification writers . . .

On Tue, Jan 19, 2010 at 2:40 AM, Dmitriy Sintsov <[email protected]> wrote:
> [[work::http://upload.wikimedia.org/...terrestrialglobe-1592-20061127.jpg]]
> [[title::Emery Molyneux Terrestrial Globe]]
> [[author::Bob Smith]]
> [[license::http://creativecommons.org/licenses/by-sa/3.0/us/]]

We could use this, but I don't see a big advantage over raw microdata
if a) we'll be outputting as microdata at first anyway, and b) it's
only expected to be used for a very few things like licenses,
presumably hidden away behind templates.  If it is done, though, it
should be with curly braces for sanity's sake: {{#prop:author|Bob
Smith}} or whatnot.

This sort of thing might be good syntax for a separate RDF stream, but
I think we can keep that simpler.  Instead of having {{Infobox
foo|name=Bob Smith}} contain, somewhere, {{#prop:name|{{{name}}}}},
creating the triple (page name, 'name', 'Bob Smith') for the page, why
not just leave out the #prop and have *every* template parameter
create a triple?  So {{foo|bar=baz|quuz=quuuz}} would create the
triples (page name, 'foo|bar', 'baz'), (page name, 'foo|quuz',
'quuuz') with no extra markup needed.  The triples could then be
transformed into a more useful form by the consumer, using a language
like OWL.  This is something like how dbpedia.org works right now,
AFAICT.

_______________________________________________
Wikitech-l mailing list
[email protected]
https://lists.wikimedia.org/mailman/listinfo/wikitech-l

Re: [Wikitech-l] RDFa and Microdata in MediaWiki

Reply via email to