Crawling Question

Michael Kelleher Fri, 18 Nov 2011 11:21:39 -0800

How do people handle binary documents and images? The "default" regexfilter has:


# skip image and other suffixes we can't yet parse
-\.(gif|GIF|jpg|JPG|png|PNG|ico|ICO|css|sit|eps|wmf|zip|ppt|mpg|xls|gz|rpm|tgz|mov|MOV|exe|jpeg|JPEG|bmp|BMP)$



but some of this content, I would want to pass along to Solr for indexing.

Is anyone else doing this kind of thing?

Crawling Question

Reply via email to