On Wed, 2005-04-20 at 09:55 -0700, Doug Cutting wrote: > Jason Tang wrote: > > Do anyone working on this issue [hiding file URLs when doing a remote > > search] > > ? If none, I will go on. > > I suppose it is not hard to support "indexing locally and searching > > remotely". > > A simple way to implement this would be to change the protocol-file > plugin to handle http urls (add protocol-name="http" in plugin.xml), > then modify FileResponse.java to optionally accept http urls and convert > them to pathnames relative to some root directory. Does that make sense?
Modifying the JSP sounds simpler for any particular installation. For more general use, there's probably a general need for Nutch-visible-URL-to-externally-visible-URL translation at display time too. For example, at one time we ran Nutch against an internal web server with a mirror of a bunch of content that lived at some externally-accessible URL; we wanted the search results to display the externally-accessible URLs. Last time I was doing filesystem indexing (with Nutch 0.5), I ran into a bunch of minor problems: - copying the entire filesystem into my segment directories was undesirable, but mandatory - limits on file size and number of outgoing links per "page" weren't helpful - if a directory name ended up in Nutch without a trailing slash (file:///home/kragen rather than file:///home/kragen/), the relative links from it were wrong. - directories had links to "..", so three passes of crawling from /home/kragen/a/b/c would index everything three levels down from there, but also /home/kragen, /home/kragen/a/*, and /home/kragen/a/b/*/*, which wasn't what I wanted. Also, Nutch was noticeably slower than Lucene, for whatever reason, and that was more noticeable when the data was coming from a 300-megabit-per-second hard disk than a 1-megabit-per-second network link. ------------------------------------------------------- This SF.Net email is sponsored by: New Crystal Reports XI. Version 11 adds new functionality designed to reduce time involved in creating, integrating, and deploying reporting solutions. Free runtime info, new features, or free trial, at: http://www.businessobjects.com/devxi/728 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
