Re: Mahout for Keyword Extraction

vineet yadav Thu, 03 Feb 2011 05:38:02 -0800

Hi Joyce,
Mahout uses clustering algorithm to extract top terms or topics from
documents sets. It uses basically three types of algorithm for keyword
extraction .
1) Collocations extraction:-
https://cwiki.apache.org/confluence/display/MAHOUT/Collocations
2) Clustering algorithm: It supports clustering algorithm like k-means,
fuzzy k-mean, cancopy etc.
3)Latent Dirichet Allocation:-
https://cwiki.apache.org/confluence/display/MAHOUT/Latent+Dirichlet+Allocation
Mahout uses simple unsupervised(clustering) algorithm for keyword
extraction. Where as I think  OpenCalasis uses supervised and deep semantic
approaches. I think you are looking some supervised(classification)
algorithm for keyphrase extraction. I suggest to look at kea(
http://www.nzdl.org/Kea/download.html) and maui-indexer(
http://code.google.com/p/maui-indexer/)
Thanks
Vineet Yadav


On Thu, Feb 3, 2011 at 6:51 PM, Joyce Babu <[email protected]> wrote:

> Hi,
>
> I am new to Java and Machine Learning concept. I was searching for a method
> to extract keywords (like names of people, organization, places etc) from
> new stories sorted by relevance. I found several web services like
> OpenCalais that provide similar service, but they don't detect most of my
> terms. I have a list of approved keywords, and only need to detect from that
> list.
>
> I found out about Machine Learning and got interested in the concept. I
> read somewhere that the classification feature of mahout can be used for
> detecting keywords by classifying terms as keywords and non-keywords. I have
> been trying to learn mahout for the past 30 hours, but haven't reached
> anywhere. It is not useful to waste time trying to learn, if mahout is not
> the tool to solve my problem.
>
> Can someone provide details on using mahout for term extraction? Is it
> possible to do this with little to medium knowledge in Java? Is it an
> overkill to use mahout for this? Should I go for an NLP solution?
>
> Thanks,
> Joyce
>
>
>

Re: Mahout for Keyword Extraction

Reply via email to