On 9/5/07, Matt Liberty <[EMAIL PROTECTED]> wrote:
> One way to get a common man's vocabulary would be to use Project Gutenberg's 
> free books and extract all the words from a few thousand books.  Then take 
> all the related words (plurals, etc) and call that a dictionary.
>
> Would this benefit anyone but me?  I've considered doing it for a while but 
> it didn't seem to be worth the bother just for myself.

Having been tracking some of the linguistic community lists for a few
years I've been keeping up with ways of doing something like this.
However due to the somewhat archaic bias of PG that's probably not a
representative sample.

My best 'home made' list of words ordered by frequency of use came
from using the British National Corpus (BNC) -
http://www.gtoal.com/dicts/bnc ... of course the downside of that one
is that it's primarily British spelling.  There's another US corpus at
Brown I think, which is quite popular.

Graham
PS The other files in http://www.gtoal.com/dicts might be of some
interest too.  I may be able to drag up some more if they were needed,
but there's enough there to be getting on with if anyone wants to try
some experiments...

Reply via email to