HI ,
   I was trying to test a scenario in nutch.
   Scenario - I have a page P1 which has content C1.
                  I have indexed it using bin/nutch .. 
                  I have redeployed nutch and on searching I am able to
search C1.
        
                   Now in the same page P1 I have changed content from C1 to
C1,C2 .
                   I have recrawled the web application.
                   But 
                         If I search for C2 I am not able to get the page.
                         If I search for C1 I am able to get the page but
the content is the old content i.e. C1 only.
                  
                   I assume the reason for this problem is
db.default.fetch.interval set to 30 which is the number of days after which
the refetch is to happen.
                   If I want to crawl the site after every 1 hour how can i
do it.I am using nutch-0.9 .I have also tried floating values like 15f .. .
Please give your inputs .
Regards,
Rinesh
    
-- 
View this message in context: 
http://www.nabble.com/Recrawling-updated-pages-tp21228900p21228900.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to