Hi Isabel,

I guess this task very much depends on the domain on which you want to apply it. For instance discussion about products like Mahout should be quite centered (this mailing list) and rather frequented by advanced users. So I don't expect many duplicates (and also didn't find in my evaluations). And even if a significant amount exists, how to identify them? I assume the more complex the topics the more ways to express them. That's why I focus on good categorization and a reasonable selection of entries first (tough enough for some occasions), not so much on the "frequent" in FAQ.

But for the commercial sector (e.g. consumer electronics) I think this could work. Having a very large database of inquiries (mail support, call center logs, ...), hierarchical clustering and fine grained settings there should be clusters (or rather groups/topics) of near duplicates at the bottom level and you simply order them by size.

Stefan

Am 23.02.2011 14:08, schrieb Isabel Drost:
On Wed, 23 Feb 11 Sean Owen wrote:

Nice, very interesting to see and read!
Very interesting indeed. Wondering whether creating a "Top 10" of the
most frequently asked questions could be created that way as well.

Isabel

Reply via email to