Dominik, thank you so much for your answers, you have been very helpful. Just one more :)
if I understand correctly... the way to go about the whole process: 1. fetch/parse - ndfs 2. decide how many segments (datasize?) you want on each searcher machine 3. invert,index,dedup, merge selected indexes to some ndfs folder 4. copy the ndfs folder to searcher machine and follow this procedure after every fetch? Thanks. P.S. in the searcher log it clearly says it opens the linkdb for some reason: [EMAIL PROTECTED] searcher]$ bin/nutch server 9004 /nutch/ 060130 145747 10 parsing file:/home/nutchuser/searcher/conf/nutch-default.xml 060130 145747 10 parsing file:/home/nutchuser/searcher/conf/nutch-site.xml 060130 145747 10 opening merged index in /nutch/index 060130 145747 10 opening segments in /nutch/segments 060130 145747 10 opening linkdb in /nutch/linkdb 060130 145748 11 Server listener on port 9004: starting On Mon, 2006-01-30 at 13:55 +0100, Dominik Friedrich wrote: > Gal Nitzan schrieb: > > I have copied only the segments directory but the searcher returns 0 > > hits. > > > You have to put the index and segments dir into a directory named > "crawl" and start tomcat from the directory that contains crawl. The > nutch.war file contains a nutch-default.xml with > > <property> > <name>searcher.dir</name> > <value>crawl</value> > <description> > Path to root of crawl. This directory is searched (in > order) for either the file search-servers.txt, containing a list of > distributed search servers, or the directory "index" containing > merged indexes, or the directory "segments" containing segment > indexes. > </description> > </property> > > > Do I need to copy the linkdb and the index folders as well? > No, the linkdb contains an inverted link list (for each url all urls > that point to it) and is only used to calculate the page score while > indexing. > > best regards, > Dominik > > ------------------------------------------------------- This SF.net email is sponsored by: Splunk Inc. Do you grep through log files for problems? Stop! Download the new AJAX search engine that makes searching your log files as easy as surfing the web. DOWNLOAD SPLUNK! http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
