sandeep pujar wrote:
By incremental I meant after a full crawl then next
crawls should fetch only the changed pages.
The problem with fetching changed pages is you need to know what pages
have changed. Once you do you can load only the changed pages through
an inject, generated, fetch, cycle and then merge crawldb and segments
with previously fetched results. The python script performs this type
of process but not for changed pages, for new unfetched links. You may
be able to modify it to fetch only changed pages.
Dennis Kubes
I was not clear on how I could use the python
automation script for that.
Is there something I am missing here ?
--- Dennis Kubes <[EMAIL PROTECTED]> wrote:
You can use the python automation script found at:
http://wiki.apache.org/nutch/Automating_Fetches_with_Python
I almost have a new version ready. Will post it in
the next couple of
days to the wiki.
Dennis Kubes
sandeep pujar wrote:
Greetings,
Are there ways we can initiate incremental
crawl/index
using Nutch.
I tried to lookup wikis and other sources and did
not
find much information.
Any ideas pointers,
Thanks,
Sandeep
____________________________________________________________________________________
Sucker-punch spam with award-winning protection.
Try the free Yahoo! Mail Beta.
http://advision.webevents.yahoo.com/mailbeta/features_spam.html
____________________________________________________________________________________
Don't get soaked. Take a quick peak at the forecast
with the Yahoo! Search weather shortcut.
http://tools.search.yahoo.com/shortcuts/#loc_weather