Hi all. Im tring to make a crawl for image documents only(jpg, gif,png,ico,bmp), but unafortunetly some html are included in my index to. I have used a sufix-urlfilter.txt plugin restricting .html,.php,.xml but there are some html page that not have extensions and this are being inserted in my solr index. Also i have restrict for all in regex-urlfilter.txt and permit this image only but nutch said that no have document to fetch, Im using nutch 1.4 and solr 3.6. Any body can help me or point me in correct way to make a crawl only for documents that i want. Thanks in advance.
10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS INFORMATICAS... CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION http://www.uci.cu http://www.facebook.com/universidad.uci http://www.flickr.com/photos/universidad_uci

