Re: OT: word frequency analysis

Steve Prior 17 Jan 2005 05:16:26 -0000

Probably want to nuke punctuation and capitalization before doing
the sort.  I'm too braindead at the moment, but some perl incantation
might be the way to go, or if you're old school then awk would probably
work.

Steve

Rich Puhek wrote:

Loren Wilton wrote:
I'm not a unix type, so how to do this isn't obvious to me, but it is
probably trivial.
Given a file with a few paragraphs of words (multiple words per line, obviously) I want to generate a list of the individual words in descending order of occurance frequency. I'd like the frequency number with each word too.
Can anyone give me a simple command line incantation to do that?
Thanks,
        Loren
Something like the following should do what you want:
sed -e's/ /g' <inputfile> | sort | uniq -c | sort -rn
In English: go through <inputfile>, replacing any spaces with a newline (so each word is on its own line). Send that to sort. Send the sorted output to the uniq command, and have uniq count the number of occurrences. Finally, send the output of uniq to the sort command, and have it sort by frequency.

Not extremely trivial, but a good study in how commands like sed, uniq, and sort can be pretty powerful.

--Rich

Re: OT: word frequency analysis

Reply via email to