Hi Savannah, You can control indexing with an index plugin. If you don't want a particular url in the index, just return null.
Regards, Arkadi >-----Original Message----- >From: Savannah Beckett [mailto:[email protected]] >Sent: Friday, July 16, 2010 1:41 AM >To: [email protected] >Subject: How to Index Only Pages with Certain Urls? > >Hi, > I want nutch to crawl abc.com, but I want to index only car.abc.com. > car.abc.com links can in any levels in abc.com. So, basically, I want >nutch to >keep crawl abc.com normally, but index only pages that start as >car.abc.com. > e.g. car.abc.com/toyota...car.abc.com/honda... > > > >I set the regex-urlfilter.txt to include only car.abc.com and run the >command >"generate crawl/crawldb crawl/segments", but it just say "Generator: 0 >records >selected for fetching, exiting ..." . I guess car.abc.com links exist >only in >several levels deep. > > >How to do this? I am using nutch 1.1 and solr 1.4.1 >Thanks. > > >

