Posted by Sasha Volokh:
Zipf's law:
http://volokh.com/archives/archive_2006_12_31-2007_01_06.shtml#1167927152
The linguists at [1]Language Log have been poking fun at a BBC story
suggesting that British teens have poor vocabularies and that Britain
is becoming a nation of "Vicky Pollards." The main posts on the
subject are [2]here and [3]here; an extra post is [4]here, and for a
(very partial) retraction of their original mockery (which was
substantially fair, but here they go into greater theoretical detail)
[5]here.
By the way, who is Vicky Pollard? The Language Loggers suggest looking
[6]here, [7]here, [8]here, and [9]here. I've only looked at the fourth
of those links, but it's pretty funny.
In any event, the basic moral is that the BBC doesn't know what it's
talking about. For one thing:
The Vicky character -- a broad satire of the accent, dress and
manners of British lumpen-teen females -- is portrayed as
hyper-verbal. One of the basic Vicky bits is her jabbering rapidly
on automatic pilot, saying far more than she should. Yet the BBC
sees her as someone who is unable to communicate due to an
inadequate word stock, not someone who over-communicates with
socially inappropriate content, accent, word choice and sentence
structure. This is another piece of evidence that journalists these
days are incapable of elementary observation and common-sense
description, at least when it comes to speech and language.
For another thing, the story generated the assertion that "the top 20
words used [by British teens] . . . account for around a third of all
words." Now, you're supposed to read that and imagine "um," "like,"
"y'know" . . . but it turns out that everyone does the same thing.
Having the top 20 words account for a third of all your words is a
normal distribution. (That's "normal" in the "ordinary" sense, not the
"Gaussian" sense.) Take a look at [10]Zipf's Law, and then read this
lovely article about the [11]Oxford English Corpus, where you can find
the 100 commonest English "words" (where "words" basically means
"[12]lemmas," if you find that helpful).
Especially funnily, the Language Log folks analyzed a text by the
professor responsible for the statistic, and found that he, too,
followed the same 20/one-third law! Not that the professor is really
to blame; of course, his research was [13]badly mangled by the media.
References
1. http://itre.cis.upenn.edu/~myl/languagelog/
2. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003921.html
3. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003922.html
4. http://itre.cis.upenn.edu/~myl/languagelog/archives/003993.html
5. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003976.html
6. http://www.youtube.com/watch?v=LFg8pxxvRjc
7. http://www.youtube.com/watch?v=moEyOhZ7B44
8.
http://www.youtube.com/results?search_query=%22Vicky+Pollard%22&search=Search
9. http://www.youtube.com/watch?v=JEnS-ELOnq0
10. http://en.wikipedia.org/wiki/Zipf%27s_law
11. http://www.askoxford.com/oec/mainpage/oec02/?view=uk
12. http://www.ansi.okstate.edu/breeds/other/llama/llama1.jpg
13. http://itre.cis.upenn.edu/~myl/languagelog/archives/003926.html
_______________________________________________
Volokh mailing list
[email protected]
http://lists.powerblogs.com/cgi-bin/mailman/listinfo/volokh