Could you please give me an example as to how to use it, I could not find man 
page for the command, such as how to input url text file and segment directory 
etc
   
  bin/nutch freegen urls segments (is this correct? urls is the directory of 
holding url files, and segments is the directory holding fetchlists being 
generated by this command)
   
  and after that, how I can merge those new url pages fetched to original 
crawldb ( not sure if updatedb would work, since those new urls not generated 
from crawldb, but from a url text file)?
   
  Thank you.
   
  Jenny

Vishal Shah <[EMAIL PROTECTED]> wrote:
  Hi Jenny, Eyal,

I usually do this by using the FreeGenerator Tool
(org.apache.nutch.tools.FreeGenerator). I find this the most convenient way
to generate a fetchlist that contains a specific list of urls to be fetched.

You can run this tool by running the following command in your
nutch_home:

bin/nutch org.apache.nutch.tools.FreeGenerator

Regards,

-vishal.

-----Original Message-----
From: Jenny LIU [mailto:[EMAIL PROTECTED] 
Sent: Monday, September 10, 2007 1:37 AM
To: [email protected]
Subject: how to generate seperate segment to have a small list of new urls
to be fetched only

Once a while, I have a small list of urls (all Internet, not Intranet)
needed to be added to existing urls db, so needed to to be injected to db,
how I
can generate a seperate segment with those urls only so after the
fetching, the db has only new urls adding to the existing urls and
existing ones just untouched, right now I have to do the whole thing
(inject, generate fetch etc) including existing urls over to have new
urls to be added to db, any one have any idea? please advise,

Thank you.

Jenny

---------------------------------
Park yourself in front of a world of choices in alternative vehicles.
Visit the Yahoo! Auto Green Center.



       
---------------------------------
Yahoo! oneSearch: Finally,  mobile search that gives answers, not web links. 

Reply via email to