Hello nutch-general,

Thanks for your answer!

Also it is interested for me: How all Nutch users support their databases a
freshen? Maybe exists some script which I can run from cron, give them
db path and segments path and which will be to update database?
Because I have not absolutely understood what steps it is necessary to
make to just "freshen" my current database by hand... ;-/
At now I think is required to update:
1) Generate fetch list (bin/nutch generate <old_db_dir> <new_segment_dir>)
2) Fetch this list (bin/nutch fetch <new_segment_dir>)
   Now we have content of all required for update links in
   <new_segment_dir>.
But farther what is better? Merge this new segment with old segment
(mergesegs?) or i can to make something like:
 "bin/nutch updatedb <old_db_dir> <new_segment_dir>" ?
 I have tried and mergesegs and updatedb too, but does not see changes
 in db after this actions.

 Also I have noticed, that if I just delete old db, which used by
 searcher and place new db, searcher continues to use old db until I
 not restarted Tomcat. if I do make some changes directly in db, which
 searcher use, then he will consider them?

Or how to update database without stopping web search?

Thanks in advance!




-----------Original Message-----------

> To refetch you need to generate a fetch list of urls that need to be refetched.
>  See http://www.nutch.org/cgi-bin/twiki/view/Main/GenerateOptions
>  You can configure the timespan until a url need to be refetched in the config 
>  file.
>  See http://www.nutch.org/conf/nutch-default.xml
>  "db.default.fetch.interval 30 The default number of days between re-fetches of 
>  a page.  "
>  Then you just fetching and in the end you need to merge the segments.
>  
>  HTH
>  Stefan 
>  
>  
>  
>  Zitiere NGS <[EMAIL PROTECTED]>:
>  
 >> Hello,
 >> 
 >>   I have made "bin/nutch crawl". Now I want to re-fetch database. I
 >>   make "bin/nutch updatedb db <segments directory>". Some seconds and
 >>   I see "Finishing Update". I think this because default re-fetch
 >>   time is set to 30 days. Then I have made "bin/nutch generate db
 >>   segments -adddays 31". Now all links must be marked as out-of-date?
 >>   I again make "bin/nutch updatedb db segments" and see that again
 >>   really nothing was updated...
 >> 
 >>   Maybe I have skipped some stages?



-- 
Best regards,
 NGS                          mailto:[EMAIL PROTECTED]



-------------------------------------------------------
This Newsletter Sponsored by: Macrovision 
For reliable Linux application installations, use the industry's leading
setup authoring tool, InstallShield X. Learn more and evaluate 
today. http://clk.atdmt.com/MSI/go/ins0030000001msi/direct/01/
_______________________________________________
Nutch-general mailing list
[EMAIL PROTECTED]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to