hi...

i am using nutch-1.0

i am trying to crawl pages. But further restrict them while indexing.

Means  Pages containing certain phrase in their url or content should not b
indexed..

i hv tried stopping output.collect() in IndexerMapReduce.java at line- 162

but it give null point error.

Is there a way to do this..

plz can someone gv me the code to do this



with Regards

Tarun
-- 
View this message in context: 
http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to