Could you please give me an example as to how to use it, I could not find man page for the command, such as how to input url text file and segment directory etc bin/nutch freegen urls segments (is this correct? urls is the directory of holding url files, and segments is the directory holding fetchlists being generated by this command) and after that, how I can merge those new url pages fetched to original crawldb ( not sure if updatedb would work, since those new urls not generated from crawldb, but from a url text file)? Thank you. Jenny
Vishal Shah <[EMAIL PROTECTED]> wrote: Hi Jenny, Eyal, I usually do this by using the FreeGenerator Tool (org.apache.nutch.tools.FreeGenerator). I find this the most convenient way to generate a fetchlist that contains a specific list of urls to be fetched. You can run this tool by running the following command in your nutch_home: bin/nutch org.apache.nutch.tools.FreeGenerator Regards, -vishal. -----Original Message----- From: Jenny LIU [mailto:[EMAIL PROTECTED] Sent: Monday, September 10, 2007 1:37 AM To: [email protected] Subject: how to generate seperate segment to have a small list of new urls to be fetched only Once a while, I have a small list of urls (all Internet, not Intranet) needed to be added to existing urls db, so needed to to be injected to db, how I can generate a seperate segment with those urls only so after the fetching, the db has only new urls adding to the existing urls and existing ones just untouched, right now I have to do the whole thing (inject, generate fetch etc) including existing urls over to have new urls to be added to db, any one have any idea? please advise, Thank you. Jenny --------------------------------- Park yourself in front of a world of choices in alternative vehicles. Visit the Yahoo! Auto Green Center. --------------------------------- Yahoo! oneSearch: Finally, mobile search that gives answers, not web links.
