I have followed the tutorial at media-style.com and actually have a
mapred installation of nutch working. Thanks Stefan :)
My question now is the correct steps to continuously fetch and index. I
have read some people talking about mergesegs and updatedb however
Stefan's tutorial doesn't list these as steps. If you want to
continually fetch more and more levels from your crawldb and
appropriately update your index what is the correct method for doing so?
Currently I am doing this:
generate
fetch
invertlinks
index

Only problem I am having is that I seem to not be able to get any pages
past the index pages on the root domains I injected. I feel like I am
missing some important steps. Any input is appreciated.
Mike

Reply via email to