I have a website with digitized literature organized as simple, static HTML files, one for each page or chapter, in directories for each volume, in larger directories for each work, i.e. URLs like http://runeberg.org/$WORK/$VOLUME/$CHAPTER.html
Every page has a search box which today leads to a Google search with a "site:" qualifier. Using my own Nutch search engine could be an alternative. Google adds the value of its page rank, but what I would want is a search that prioritizes hits in adjacent chapters, volumes or works, relative to the page from where the user started the search. On http://runeberg.org/hagberg/a/0210.html is the opening of Act 2, Scene 1 of Hamlet by Shakespeare, the Swedish translation by Hagberg. If I start there and search for Athens, I would like to see hits in "A Midsummer Night's Dream" by the same author, which is set in the Greek capital, e.g. http://runeberg.org/hagberg/a/0003.html , rather than hits elsewhere on the same website. I guess instead of Google's "site:runeberg.org" I would like "near:runeberg.org/hagberg/a/0210.html". The distance metric could be either the number of common characters in the URL or the link distance (two clicks away). Can Nutch do this? Can any other search engine? -- Lars Aronsson ([EMAIL PROTECTED]) Project Runeberg - free Nordic literature - http://runeberg.org/ ------------------------------------------------------- This SF.Net email is sponsored by: Sybase ASE Linux Express Edition - download now for FREE LinuxWorld Reader's Choice Award Winner for best database on Linux. http://ads.osdn.com/?ad_id=5588&alloc_id=12065&op=click _______________________________________________ Nutch-general mailing list [EMAIL PROTECTED] https://lists.sourceforge.net/lists/listinfo/nutch-general
