I'm looking for a way to get common word groups within documents. That is, what are the top two, three, ... n word groups within the index.

I was messing with indexing adjacent words together (sorry about the earlier commit)... is this a reasonable approach? Any other ideas for pulling out common phrases? Any simple post processing?

ryan

Reply via email to