Re: Automatically extracted Mahout FAQs

Sean Owen Wed, 23 Feb 2011 00:29:28 -0800

Nice, very interesting to see and read!


On Wed, Feb 23, 2011 at 5:15 AM, Stefan Henß
<[email protected]> wrote:
> Hi everybody,
>
> I'm currently doing research for my bachelor thesis on how to automatically
> extract FAQs from unstructured data.
>
> For this I've built a system automatically performing the following:
> - Load thousands of conversations from forums and mailing lists (don't mind
> the categories there).
> - Build categorization solely based on the conversation's texts (by
> clustering).
> - Pick the best modelled categories as basis for one FAQ each.
> - For each question (first entry in a conversation) find the best reply from
> its answers.
> - Select the most relevant and well formatted question/answer-pairs for each
> FAQ.
>
> Most of the steps almost completely rely on the data from the categorization
> step which is obtained using the latent Dirichlet allocation model.
>
> For the evaluation part I'd like to ask you for having a look at one or two
> FAQs and maybe give some comments on how far the questions matched the FAQ's
> title, how relevant they were etc.
>
>
> Here's the direct link to the Mahout FAQs: http://faqcluster.com/mahout-data
>
> (There are some other interesting FAQs as well at http://faqcluster.com/)
>
>
> Thanks for your help
>
> Stefan
>

Re: Automatically extracted Mahout FAQs

Reply via email to