hi... i am using nutch-1.0
i am trying to crawl pages. But further restrict them while indexing. Means Pages containing certain phrase in their url or content should not b indexed.. i hv tried stopping output.collect() in IndexerMapReduce.java at line- 162 but it give null point error. Is there a way to do this.. plz can someone gv me the code to do this with Regards Tarun -- View this message in context: http://www.nabble.com/how-to-filter-pages-before-indexing-tp24514463p24514463.html Sent from the Nutch - User mailing list archive at Nabble.com.