Posted by Sasha Volokh:
Zipf's law:
http://volokh.com/archives/archive_2006_12_31-2007_01_06.shtml#1167927152


   The linguists at [1]Language Log have been poking fun at a BBC story
   suggesting that British teens have poor vocabularies and that Britain
   is becoming a nation of "Vicky Pollards." The main posts on the
   subject are [2]here and [3]here; an extra post is [4]here, and for a
   (very partial) retraction of their original mockery (which was
   substantially fair, but here they go into greater theoretical detail)
   [5]here.

   By the way, who is Vicky Pollard? The Language Loggers suggest looking
   [6]here, [7]here, [8]here, and [9]here. I've only looked at the fourth
   of those links, but it's pretty funny.

   In any event, the basic moral is that the BBC doesn't know what it's
   talking about. For one thing:

     The Vicky character -- a broad satire of the accent, dress and
     manners of British lumpen-teen females -- is portrayed as
     hyper-verbal. One of the basic Vicky bits is her jabbering rapidly
     on automatic pilot, saying far more than she should. Yet the BBC
     sees her as someone who is unable to communicate due to an
     inadequate word stock, not someone who over-communicates with
     socially inappropriate content, accent, word choice and sentence
     structure. This is another piece of evidence that journalists these
     days are incapable of elementary observation and common-sense
     description, at least when it comes to speech and language.

   For another thing, the story generated the assertion that "the top 20
   words used [by British teens] . . . account for around a third of all
   words." Now, you're supposed to read that and imagine "um," "like,"
   "y'know" . . . but it turns out that everyone does the same thing.
   Having the top 20 words account for a third of all your words is a
   normal distribution. (That's "normal" in the "ordinary" sense, not the
   "Gaussian" sense.) Take a look at [10]Zipf's Law, and then read this
   lovely article about the [11]Oxford English Corpus, where you can find
   the 100 commonest English "words" (where "words" basically means
   "[12]lemmas," if you find that helpful).

   Especially funnily, the Language Log folks analyzed a text by the
   professor responsible for the statistic, and found that he, too,
   followed the same 20/one-third law! Not that the professor is really
   to blame; of course, his research was [13]badly mangled by the media.

References

   1. http://itre.cis.upenn.edu/~myl/languagelog/
   2. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003921.html
   3. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003922.html
   4. http://itre.cis.upenn.edu/~myl/languagelog/archives/003993.html
   5. http://itre.cis.upenn.edu/%7Emyl/languagelog/archives/003976.html
   6. http://www.youtube.com/watch?v=LFg8pxxvRjc
   7. http://www.youtube.com/watch?v=moEyOhZ7B44
   8. 
http://www.youtube.com/results?search_query=%22Vicky+Pollard%22&search=Search
   9. http://www.youtube.com/watch?v=JEnS-ELOnq0
  10. http://en.wikipedia.org/wiki/Zipf%27s_law
  11. http://www.askoxford.com/oec/mainpage/oec02/?view=uk
  12. http://www.ansi.okstate.edu/breeds/other/llama/llama1.jpg
  13. http://itre.cis.upenn.edu/~myl/languagelog/archives/003926.html

_______________________________________________
Volokh mailing list
[email protected]
http://lists.powerblogs.com/cgi-bin/mailman/listinfo/volokh

Reply via email to