I have read the recent posts on this subject but want to confirm my understanding of what I am doing. I just completed my initial crawl using www.irs.gov. I now want add some more tax related urls and fetch those, but without doing irs.gov again. I created a newurls file and also put regexps in the url filter. Are these commands what I should do? bin/nutch inject db newurls bin/nutch generate db segments bin/nutch fetch segments/<latest_segment> Should I remove irs.gov from the filter so I doesnt get done again because some of the other new urls link back to irs.gov?
Richard Braman mailto:[EMAIL PROTECTED] 561.748.4002 (voice) http://www.taxcodesoftware.org <http://www.taxcodesoftware.org/> Free Open Source Tax Software
