Hi Ali,

I am not entirely sure, but I do not think you can determine the content
type before parsing. I think filtering is performed before parsing.

My suggestion is to implement a scoring or an indexing filter that returns
an null nutch document based on content type.

Regards
Ameer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-regex-urlfilter-for-different-file-types-in-nutch-tp4153586p4155988.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Reply via email to