[Nutch-general] prioritizing adjacent pages

Lars Aronsson Mon, 01 Nov 2004 10:37:53 -0800

I have a website with digitized literature organized as simple, static 
HTML files, one for each page or chapter, in directories for each 
volume, in larger directories for each work, i.e. URLs like
http://runeberg.org/$WORK/$VOLUME/$CHAPTER.html


Every page has a search box which today leads to a Google search with
a "site:" qualifier.  Using my own Nutch search engine could be an 
alternative.  Google adds the value of its page rank, but what I would 
want is a search that prioritizes hits in adjacent chapters, volumes 
or works, relative to the page from where the user started the search.

On http://runeberg.org/hagberg/a/0210.html is the opening of Act 2,
Scene 1 of Hamlet by Shakespeare, the Swedish translation by Hagberg.
If I start there and search for Athens, I would like to see hits in "A
Midsummer Night's Dream" by the same author, which is set in the Greek
capital, e.g. http://runeberg.org/hagberg/a/0003.html , rather than
hits elsewhere on the same website.

I guess instead of Google's "site:runeberg.org" I would like
"near:runeberg.org/hagberg/a/0210.html".  The distance metric could be
either the number of common characters in the URL or the link distance
(two clicks away).

Can Nutch do this?  Can any other search engine?


-- 
  Lars Aronsson ([EMAIL PROTECTED])
  Project Runeberg - free Nordic literature - http://runeberg.org/


-------------------------------------------------------
This SF.Net email is sponsored by:
Sybase ASE Linux Express Edition - download now for FREE
LinuxWorld Reader's Choice Award Winner for best database on Linux.
http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] prioritizing adjacent pages

Reply via email to