Hi Salman

I have got news documents (around 3000 and continuously increasing)
> containing news about companies, investment, stocks, economy, quartly
> income etc. My goal is to have the news sorted in such a way that I know
> which news correspond to which company. e.g for the news item "Apple
> launches new iphone", I need to associate the company Apple with it. A
> particular news item/document only contains 'title' and 'description' so I
> have to analyze the text in order to find out which company the news
> referes to. It could be multiple companies too.
>

If this is the problem you are trying to solve.
I would suggest a different solution. As you want to classify based on
company only.
Its better to use a NER system to identify the company names in the
document and use the company names to map the articles to the company.
This would be a simple and effective solution.


> You can see how confused I am about how to address this issue. Another
> thing that concerns me is that if its possible to make a system this
> intelligent, that if the news says 'iphone sales at a record high' without
> using the word 'Apple', the system can classify it as a news related to
> apple?
>

This is hard to achieve. You may need to spend lot of time on creating the
training set. Even then the possibility of such a system using
classification is too low.

But if you are going with a NER based solution you could customize the NER
to identify the entities in this case "iPhone" and then map it to apple.
This is achievable at low risk.

Just a thought.
i would not recommend mahout for such a problem.

-- 
*Biju*
**

Reply via email to