Ian Holsman wrote:
Hi.
I have just started to dig into the documentation and examples that UIMA
provide, but am not really up to speed with all the IR ways of doing
things.
What I really want to be able to do is 2 things:
- what tagthe.net does. extract key information from a text document (an
example:
http://tagthe.net/api?url=http://news.aol.com/entertainment/tv/articles/_a/sopranos-premiere-draws-a-smaller-mob/20070411064509990001
extracts key points from
http://news.aol.com/entertainment/tv/articles/_a/sopranos-premiere-draws-a-smaller-mob/20070411064509990001)
- keyword density analysis, which might provide a clue on what keywords
google, or yahoo's search would think about the page.
I'm fairly certain that UIMA's entity extraction can handle the first
part, but am unsure if it can 'do' the second, and not sure if UIMA is
the right tool for the job.
regards
Ian
--
Ian Holsman
[EMAIL PROTECTED]
Hi Ian,
UIMA would certainly be the right platform to run something like tagthe.net.
I think UIMA could help with your second point, as you could at least
use it for keyword extraction. However, note that a very important
factor for the ranking of search results (which I assume is what you're
interested in) is the number and nature of links that point *to* a page,
something that no analysis of the page itself will give you.
--Thilo