Hi, Well for starters parse'-tika in Nutch trunk will parse your metadata and send it to Solr for the following
http://tika.apache.org/0.10/formats.html If there are additional formats you wish to get metadata from then I suggest that you look towards writing some implementation which can extend this. hth On Fri, Nov 18, 2011 at 6:19 PM, Michael Kelleher <[email protected]>wrote: > How do people handle binary documents and images? The "default" regex > filter has: > > # skip image and other suffixes we can't yet parse > -\.(gif|GIF|jpg|JPG|png|PNG|**ico|ICO|css|sit|eps|wmf|zip|** > ppt|mpg|xls|gz|rpm|tgz|mov|**MOV|exe|jpeg|JPEG|bmp|BMP)$ > > > but some of this content, I would want to pass along to Solr for indexing. > > Is anyone else doing this kind of thing? > > > -- *Lewis*

