Henri Sivonen wrote:
And then what? Why is it useful that a computer knows that a string on
a Web page is a human name?
Off the top of my head, a couple possible benefits of tagging proper names:
* smarter search engines
(<name>Bill Gates</name> is not the words "bill" and "gates". Could
be beneficial to newspaper sites.)
* speech synthesis
(Surely there's a good reason CSS3 Speech has "interpret-as: name"
and VoiceXML has interpret-as="name")
* spell checking
(Usable by Web page editing software)
I expect the Semantic Web could work it into their
encapsulation-of-knowledge schemes.
Do the benefits of the computer having such knowledge outweigh the
cost of the human labor required to mark up names?
Good question. I expect many Web authors would not avail themselves of
the option of using <name> even if it were available.
(If you really needed to figure out on a computer which strings are
names, instead of requiring page authors to cooperate with you, you
could get results by extracting clusters of capitalized words,
matching them against a database of known first and last names and
filling in the gaps by guessing. For example, you could guess that
Krempeaux is a family name, because it is a capitalized word that
follows two well-known given names.)
That probably wouldn't work better in running text than on a page of
capitalized titles or headlines like "Bush Administration Urges Congress
to Ratify Detainee Treatment".