Re: OT: word frequency analysis

Rich Puhek 17 Jan 2005 05:06:03 -0000

Loren Wilton wrote:

I'm not a unix type, so how to do this isn't obvious to me, but it is
probably trivial.

Given a file with a few paragraphs of words (multiple words per line,
obviously) I want to generate a list of the individual words in descending
order of occurance frequency.  I'd like the frequency number with each word
too.

Can anyone give me a simple command line incantation to do that?

Thanks,
        Loren


Something like the following should do what you want:

sed -e's/ /g' <inputfile> | sort | uniq -c | sort -rn

In English: go through <inputfile>, replacing any spaces with a newline (so each word is on its own line). Send that to sort. Send the sorted output to the uniq command, and have uniq count the number of occurrences. Finally, send the output of uniq to the sort command, and have it sort by frequency.

Not extremely trivial, but a good study in how commands like sed, uniq, and sort can be pretty powerful.

--Rich

Re: OT: word frequency analysis

Reply via email to