Henri Sivonen wrote:
And then what? Why is it useful that a computer knows that a string on a Web page is a human name?
Off the top of my head, a couple possible benefits of tagging proper names:

* smarter search engines
(<name>Bill Gates</name> is not the words "bill" and "gates". Could be beneficial to newspaper sites.)
* speech synthesis
(Surely there's a good reason CSS3 Speech has "interpret-as: name" and VoiceXML has interpret-as="name")
* spell checking
   (Usable by Web page editing software)

I expect the Semantic Web could work it into their encapsulation-of-knowledge schemes.

Do the benefits of the computer having such knowledge outweigh the cost of the human labor required to mark up names?
Good question. I expect many Web authors would not avail themselves of the option of using <name> even if it were available.

(If you really needed to figure out on a computer which strings are names, instead of requiring page authors to cooperate with you, you could get results by extracting clusters of capitalized words, matching them against a database of known first and last names and filling in the gaps by guessing. For example, you could guess that Krempeaux is a family name, because it is a capitalized word that follows two well-known given names.)
That probably wouldn't work better in running text than on a page of capitalized titles or headlines like "Bush Administration Urges Congress to Ratify Detainee Treatment".

Reply via email to