Hi,
I want nutch to crawl abc.com, but I want to index only car.abc.com.
car.abc.com links can in any levels in abc.com. So, basically, I want nutch
to
keep crawl abc.com normally, but index only pages that start as car.abc.com.
e.g. car.abc.com/toyota...car.abc.com/honda...
I set the regex-urlfilter.txt to include only car.abc.com and run the command
"generate crawl/crawldb crawl/segments", but it just say "Generator: 0 records
selected for fetching, exiting ..." . I guess car.abc.com links exist only in
several levels deep.
How to do this? I am using nutch 1.1 and solr 1.4.1
Thanks.