Re: Semi-automatic Index generation?

2008-07-31 Thread viktoras didziulis
Hi David, you might wish to discard the 1000 most frequently used words from your list: English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf German: http://german.about.com/library/blwfreq01.htm Another approach is statistical - take the whole text, sort words by their frequency

Re: Semi-automatic Index generation?

2008-07-31 Thread David Bovill
Thanks for the tips! 2008/7/31 viktoras didziulis [EMAIL PROTECTED] Hi David, you might wish to discard the 1000 most frequently used words from your list: English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf German: http://german.about.com/library/blwfreq01.htm Another approach is

Re: Semi-automatic Index generation?

2008-07-31 Thread Devin Asay
On Jul 31, 2008, at 2:12 AM, viktoras didziulis wrote: Hi David, you might wish to discard the 1000 most frequently used words from your list: English: http://web1.d25.k12.id.us/home/curriculum/fuw.pdf German: http://german.about.com/library/blwfreq01.htm Another approach is statistical -

Semi-automatic Index generation?

2008-07-30 Thread David Bovill
Is there a resource/ index that any one knows of for plain uninteresting dull words. I want to take arbitrary chunks of text and search for interesting words - that is domain specific words that might be useful to links to create dictionary entries. This would mean creating a list of words and

Re: Semi-automatic Index generation?

2008-07-30 Thread Eric Chatonet
Bonjour David, Le 30 juil. 08 à 16:08, David Bovill a écrit : Is there a resource/ index that any one knows of for plain uninteresting dull words. I want to take arbitrary chunks of text and search for interesting words - that is domain specific words that might be useful to links to

Re: Semi-automatic Index generation?

2008-07-30 Thread David Bovill
Thanks Eric! 2008/7/30 Eric Chatonet [EMAIL PROTECTED] Bonjour David, Le 30 juil. 08 à 16:08, David Bovill a écrit : Is there a resource/ index that any one knows of for plain uninteresting dull words. I want to take arbitrary chunks of text and search for interesting words - that is