Hi Ali, I am not entirely sure, but I do not think you can determine the content type before parsing. I think filtering is performed before parsing.
My suggestion is to implement a scoring or an indexing filter that returns an null nutch document based on content type. Regards Ameer -- View this message in context: http://lucene.472066.n3.nabble.com/Different-regex-urlfilter-for-different-file-types-in-nutch-tp4153586p4155988.html Sent from the Nutch - User mailing list archive at Nabble.com.

