jagarlamudi jagdeesh wrote:
Hi all,
      I am building a personal IR system for a
desktop.
I have completed creating the inverted index (backward
index).
      Now I am working on retrieval part. I have no
idea how to generate the context in which the query
words are occured (as summary of the document similar
to the search engines). Any ideas in this direction
will be appreciated.

What you refer to are KWIC (keywords in context) fragments, sometimes called "snippets" - not really a summary proper, because a document's summary (either original or generated) may not contain your search keywords. A common practice to generate KWIC is to simply take a more or less fixed number of surrounding tokens from each side of the keyword.



In my inverted index , I am storing the root word ,the document in which it(root word) has occured, the position in which it has occured and the word ids of the previous and next words occuring in that position.

Take a look at the net.nutch.search.Summarizer.


--
Best regards,
Andrzej Bialecki

-------------------------------------------------
Software Architect, System Integration Specialist
CEN/ISSS EC Workshop, ECIMF project chair
EU FP6 E-Commerce Expert/Evaluator
-------------------------------------------------
FreeBSD developer (http://www.freebsd.org)



-------------------------------------------------------
This SF.Net email is sponsored by: IBM Linux Tutorials
Free Linux tutorial presented by Daniel Robbins, President and CEO of
GenToo technologies. Learn everything from fundamentals to system
administration.http://ads.osdn.com/?ad_id=1470&alloc_id=3638&op=click
_______________________________________________
Nutch-developers mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to