Hello I want to search on articles via Solr. So need to find only specific files like doc, docx, and pdf. I don't need any html pages. Thus the result of our search should only consists of doc, docx, and pdf files.
I'm using Nutch to crawling web pages and sending Nutch's data to Solr for indexing. There is an approach to search on specific file types: Put the file extension into my index and I have no idea about the type of schema nutch uses when indexing into Solr, wether it creates a specific field for file extension, and/or how we can modify the nutch indexer to create a field like that for ourselves.

