On 9/5/07, Matt Liberty <[EMAIL PROTECTED]> wrote: > One way to get a common man's vocabulary would be to use Project Gutenberg's > free books and extract all the words from a few thousand books. Then take > all the related words (plurals, etc) and call that a dictionary. > > Would this benefit anyone but me? I've considered doing it for a while but > it didn't seem to be worth the bother just for myself.
Having been tracking some of the linguistic community lists for a few years I've been keeping up with ways of doing something like this. However due to the somewhat archaic bias of PG that's probably not a representative sample. My best 'home made' list of words ordered by frequency of use came from using the British National Corpus (BNC) - http://www.gtoal.com/dicts/bnc ... of course the downside of that one is that it's primarily British spelling. There's another US corpus at Brown I think, which is quite popular. Graham PS The other files in http://www.gtoal.com/dicts might be of some interest too. I may be able to drag up some more if they were needed, but there's enough there to be getting on with if anyone wants to try some experiments...
