Thomas, It appears to me that this is exactly what I need. I can create a fetchlist on the urls I need to crawl and can then fetch them. I can essentially not worry about the older entries unless they are modified.
Two question: First of all, will this re-fetch old document already in the database? In my instance if a forum topic is updated it would be put into the flat url list. Would it be refetched with this tool? Secondly, can anyone point me in the direction of how to properly set this up? As I mentioned in another post I'm lost when it comes to java. I want to be able to compile this and use it but the last thing I want to do is screw anything up. Matt ----- Original Message ----- From: "TDLN" <[EMAIL PROTECTED]> To: <[email protected]>; "Honda-Search Administrator" <[EMAIL PROTECTED]> Sent: Sunday, June 25, 2006 3:02 AM Subject: Re: Will pay for someone to help > Matt, > > AFAIK Nutch does not support fetching arbitrary fetch lists out of the > box. > > here is a tool in JIRA that supports this though: > http://issues.apache.org/jira/browse/NUTCH-68. > > - Thomas > > > On 6/25/06, Honda-Search Administrator <[EMAIL PROTECTED]> wrote: >> I'm having a difficult time configuring nutch to behave the way I want it >> to >> behave. >> >> In a nutshell here is my situation: >> >> I crawl a number of forums that relate to Hondas every night for posts. >> The >> purpose of my website is to be a search engine for all of the forums at >> once. >> >> I have a base set of URLs in the webDB right now. Every day I write a >> file >> of URLs (that I place in urls/inject.txt) that I want nutch to inject >> into >> the database to crawl. I do NOT want to recrawl other URLS. I only want >> to >> crawl/recrawl the urls in my list. >> >> Can you help me configure nutch (or help with the correct scripts, crons, >> etc.) to do this? i've tried without success. >> >> I am running nutch 0.7.2 and am totally confused with what to do next. >> It >> seems to me to be a simple fix, but I can't figure it out. >> >> As I mentioned I will pay if someone can set me up. I've run the crawl a >> number of times now and i just keep on screwing things up. >> >> Matt >> >> > > Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
