Hi, I would rather use the wikipedia dumps!
You should have a look at jwpl http://code.google.com/p/jwpl/ BR Hannes On Wed, May 4, 2011 at 5:20 PM, Kelvin <[email protected]> wrote: > Hello, > > I would like to crawl wikipedia using Nutch, but as it is too large, I > would only like to crawl pages that are related to a particular subject. > > For example, I would like to crawl for webpages of wikipedia that contain > the term "Football". Is this possible using Nutch? > > Thank you for your kind help. >

