[Nutch-general] Re: Search setup

Gal Nitzan Mon, 30 Jan 2006 05:23:18 -0800

Dominik, thank you so much for your answers, you have been very helpful.

Just one more :)


if I understand correctly... the way to go about the whole process:

1. fetch/parse - ndfs
2. decide how many segments (datasize?) you want on each searcher
machine
3. invert,index,dedup, merge selected indexes to some ndfs folder
4. copy the ndfs folder to searcher machine

and follow this procedure after every fetch?

Thanks.

P.S.

in the searcher log it clearly says it opens the linkdb for some reason:

[EMAIL PROTECTED] searcher]$ bin/nutch server 9004 /nutch/
060130 145747 10 parsing
file:/home/nutchuser/searcher/conf/nutch-default.xml
060130 145747 10 parsing
file:/home/nutchuser/searcher/conf/nutch-site.xml
060130 145747 10 opening merged index in /nutch/index
060130 145747 10 opening segments in /nutch/segments
060130 145747 10 opening linkdb in /nutch/linkdb
060130 145748 11 Server listener on port 9004: starting




On Mon, 2006-01-30 at 13:55 +0100, Dominik Friedrich wrote:
> Gal Nitzan schrieb:
> > I have copied only the segments directory but the searcher returns 0
> > hits.
> >   
> You have to put the index and segments dir into a directory named 
> "crawl" and start tomcat from the directory that contains crawl. The 
> nutch.war file contains a nutch-default.xml with
> 
> <property>
>   <name>searcher.dir</name>
>   <value>crawl</value>
>   <description>
>   Path to root of crawl.  This directory is searched (in
>   order) for either the file search-servers.txt, containing a list of
>   distributed search servers, or the directory "index" containing
>   merged indexes, or the directory "segments" containing segment
>   indexes.
>   </description>
> </property>
> 
> > Do I need to copy the linkdb and the index folders as well?
> No, the linkdb contains an inverted link list (for each url all urls 
> that point to it) and is only used to calculate the page score while 
> indexing.
> 
> best regards,
> Dominik
> 
> 




-------------------------------------------------------
This SF.net email is sponsored by: Splunk Inc. Do you grep through log files
for problems?  Stop!  Download the new AJAX search engine that makes
searching your log files as easy as surfing the  web.  DOWNLOAD SPLUNK!
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=103432&bid=230486&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

[Nutch-general] Re: Search setup

Reply via email to