For example i put in a seed file the url  nabble.com
Then nutch fetch and parse the url, from the parse i get nabble.com/user and
nabble.com/admin
The in the next fetch job the three urls are fetched and parsed:
nabble.com
nabble.com/user
nabble.com/admin

And this process repeats until the end of the depth.(The urls are
fictitious)

I left nutch running  on Tuesday around 18.00h and today i checked my
sqlserver database and the last record was from Wednesday 10:40h. He is
still running on all urls fetched, around 3400 pages.

I didnt' checked nutch yesterday because was holiday.





--
View this message in context: 
http://lucene.472066.n3.nabble.com/Crawl-command-help-tp4001595p4001604.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to