Much clearer. I think to make the whole web crawling section should be made even more clear:
Bootstrapping should be 1 option under A section label something like "Defining the URLs that you want to include in your fetch" Option 1, bootstrap DMOZ Option 2. Make a text file with the urls you want to crawl. It this it should be mentioned if you want to limit crawling, you need to set up a filter. Common Filter Questions: Tips should be provided that the out of the box filter in a way that limits to pages without qs paramters, and how to remove that part of the filter. I mistakenly put a '+' in front of that line instaed of commenting it out. + has the effect of overriding your other lines. Common configuration issues: Maximum Content size: Maximum Retries. What else? I will gladly post these chnages to the Wiki, given some votes of confidence and other suggestions. -----Original Message----- From: Vanderdray, Jacob [mailto:[EMAIL PROTECTED] Sent: Tuesday, March 07, 2006 1:52 PM To: nutch-user@lucene.apache.org Subject: Tutorial on the Wiki I've changed the language a bit. If you're interested, take a look: http://wiki.apache.org/nutch/NutchTutorial Thanks, Jake.