Re: Automatically extracted Mahout FAQs

Stefan Henß Wed, 23 Feb 2011 23:58:30 -0800

Hi Isabel,

I guess this task very much depends on the domain on which you want toapply it. For instance discussion about products like Mahout should bequite centered (this mailing list) and rather frequented by advancedusers. So I don't expect many duplicates (and also didn't find in myevaluations). And even if a significant amount exists, how to identifythem? I assume the more complex the topics the more ways to expressthem. That's why I focus on good categorization and a reasonableselection of entries first (tough enough for some occasions), not somuch on the "frequent" in FAQ.

But for the commercial sector (e.g. consumer electronics) I think thiscould work. Having a very large database of inquiries (mail support,call center logs, ...), hierarchical clustering and fine grainedsettings there should be clusters (or rather groups/topics) of nearduplicates at the bottom level and you simply order them by size.


Stefan

Am 23.02.2011 14:08, schrieb Isabel Drost:

On Wed, 23 Feb 11 Sean Owen wrote:

Nice, very interesting to see and read!

Very interesting indeed. Wondering whether creating a "Top 10" of the
most frequently asked questions could be created that way as well.

Isabel

Re: Automatically extracted Mahout FAQs

Reply via email to