Probably want to nuke punctuation and capitalization before doing
the sort. I'm too braindead at the moment, but some perl incantation
might be the way to go, or if you're old school then awk would probably
work.
Steve
Rich Puhek wrote:
Loren Wilton wrote:
I'm not a unix type, so how to do this isn't obvious to me, but it is
probably trivial.
Given a file with a few paragraphs of words (multiple words per line,
obviously) I want to generate a list of the individual words in
descending
order of occurance frequency. I'd like the frequency number with each
word
too.
Can anyone give me a simple command line incantation to do that?
Thanks,
Loren
Something like the following should do what you want:
sed -e's/ /g' <inputfile> | sort | uniq -c | sort -rn
In English: go through <inputfile>, replacing any spaces with a newline
(so each word is on its own line). Send that to sort. Send the sorted
output to the uniq command, and have uniq count the number of
occurrences. Finally, send the output of uniq to the sort command, and
have it sort by frequency.
Not extremely trivial, but a good study in how commands like sed, uniq,
and sort can be pretty powerful.
--Rich