>What an enterprising person (not me) would do is take the text of >several books and create a fequency-of-occurence list using Scott's >algorithm, and then delete all words in the dictionary which don't >have the necessary frequency.
Jim, Excellent suggestion. In fact, it is exactly what the guys from the MRC database have done, back in 1981. This work has been repeated by different teams more than 20 times ;-). The best resource in this area being Celex (English, Dutch, German) -- web interface at: http://www.mpi.nl/world/celex/, but more complex to use than the trick I gave you. Okay, I don't use my academic signature on this list. Okay, I avoid to play it; sometimes I even play it dumb. Seriously, you have one of the world experts in lexical databases on this list. Don't be an entreprising person... ask. I have 2GB of lexical databases and more than 200 scripts to extract all informations you can think of from these lexical databases on my computer. This is my job. If you don't seem to need more than a not too long list of words, I provide no more than that. If you need a solution to any other problem, I either aleady have it or know where you can find what you need. If you want to write an application for kids and limit the words you use to ones that are understood by kids of a given age, I can provide this (this is called Age of Acquisition). If you want to only select easy or difficult to imagine words, to design a pictionary-like game (with some items being easy or difficult to draw), I can provide this (this is called imageability or meaningfullness). If you need words that are part of a same semantic category (for instance clothing items), I can provide this (check out wordnet http://wordnet.princeton.edu/). If you need a list of homophones, homographs, synonyms, etc. I have that on my computer. If you want to write a rhyming book or an application to help kids learn reading (phonics method), I can provide you with the full lists of words which have a specific letter-sound relationship in them: http://www.psy.unsw.edu.au/Users/mlange/GPC/GPC_EN/GPC.html (click on one line on the left, you will have all the words that contain that grapheme-phoneme relationship in them -- a grapheme is a unit of spelling that matches a unit of sound, like ai in pain) It's just that I am an academic. My job security and promotion prospects depend on the papers I publish. I am not supposed to spend any of my time sharing these resources or even knowledge about these resources more largely :-/. I mean, I am encouraged by existing european funding etc., but good researchers know they need to avoid to spend time doing something that doesn't lead to a publication in a well-ranked journal. So, for about 4 years now, I have had about a meter high pile with a printout of page 1 of any website that contains information about words. Because I am an academic, this remains in my office. Yes, I find it stupid too. I find that even more stupid when I get to read papers published by colleagues who would have done research of better quality if they had known about some of these resources. Worse, as a brilliant academic I am just about to submit a big, thick paper which demonstrates that in my field, we have been for the last 20 years providing solutions sometimes simple, sometimes elaborate to a wrongly specified problem (in short, we have been studying one-syllable words only; models efficient at reading one-syllable words are not at two syllable words -- it's more complex than speech synthesis, this is about integrating findings from patients with brain damage, accounting for various word properties like meaning, explaining the learning of reading, explaining second language acquisition, etc., etc.). In short, a better analysis is needed but this would require skills and tools that only about 10-20% of my colleagues have. Just a year of funding and what could be done!!! Ok, I say goodbye to a promising career. That's fine by me. I do more for the advancement of my field and possibly of science than I would ever be able to do on the brilliant academic career path. Nothing in the world is as soft and yielding as water, Yet nothing can better overcome the hard and strong, For they can neither control nor do away with it. The soft overcomes the hard, The yielding overcomes the strong; Every person knows this, But no one can practice it. Who attends to the people would control the land and grain; Who attends to the state would control the whole world; Truth is easily hidden by rhetoric. (From the Tao Te Ching) ------------------------------------------------------------------------------- Marielle Lange (PhD), Psycholinguistics, Lecturer in Psychology and Informatics University of Edinburgh, UK Homepage: http://homepages.inf.ed.ac.uk/mlange/ Lexicall project: http://lexicall.org Revolution-education project: http://revolution.lexicall.org _______________________________________________ use-revolution mailing list [email protected] Please visit this url to subscribe, unsubscribe and manage your subscription preferences: http://lists.runrev.com/mailman/listinfo/use-revolution
