how to crawl image document only with nutch ?

Eyeris Rodriguez Rueda Fri, 18 Jan 2013 10:43:54 -0800

Hi all.

Im tring to make a crawl for image documents only(jpg, gif,png,ico,bmp), but 
unafortunetly some html are included in my index to. I have used a 
sufix-urlfilter.txt plugin restricting .html,.php,.xml but there are some html 
page that not have extensions and this are being inserted in my solr index. 
Also i have restrict for all in regex-urlfilter.txt and permit this image only 
but nutch said that no have document to fetch, Im using nutch 1.4 and solr 3.6.
Any body can help me or point me in correct way to make a crawl only for 
documents that i want.
Thanks in advance.


10mo. ANIVERSARIO DE LA CREACION DE LA UNIVERSIDAD DE LAS CIENCIAS 
INFORMATICAS...
CONECTADOS AL FUTURO, CONECTADOS A LA REVOLUCION

http://www.uci.cu
http://www.facebook.com/universidad.uci
http://www.flickr.com/photos/universidad_uci

how to crawl image document only with nutch ?

Reply via email to