If you have some pre-downloaded pages on the local file system that you'd like to index, then the process is not as simple. You'd need to make some changes to the fetcher code to open and load a local file instead of opening an http socket. Be careful not to try and change the download protocol to file:// because, while the protocol is supported, you will get undesirable results in terms of in-link and anchor text information. Substituting http://blah with file://path/to/blah will make Nutch believe that file://path/to/blah is the URL of the file, which is likely not what you want.
Cheers, TAA On 1/23/07, Sean Dean <[EMAIL PROTECTED]> wrote: > What exactly are you looking to do? > > If you don't crawl for anything, then what data are you looking to index? > > You can certainly take some other persons Nutch segment (that they crawled) > and then index it yourself, on your machines. > > > ----- Original Message ---- > From: Scott Green <[EMAIL PROTECTED]> > To: [email protected] > Sent: Tuesday, January 23, 2007 12:08:31 PM > Subject: Can I generate nutch index without crawling? > > > Hi, > > I am now debugging nutch searcher and wondering can I generate nutch > index without crawling? If yes, can you give me some hints? Thanks. > ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
