Re: Different regex-urlfilter for different file types in nutch

atawfik Sat, 30 Aug 2014 10:57:06 -0700

Hi Ali,

I am not entirely sure, but I do not think you can determine the content
type before parsing. I think filtering is performed before parsing.


My suggestion is to implement a scoring or an indexing filter that returns
an null nutch document based on content type.

Regards
Ameer



--
View this message in context: 
http://lucene.472066.n3.nabble.com/Different-regex-urlfilter-for-different-file-types-in-nutch-tp4153586p4155988.html
Sent from the Nutch - User mailing list archive at Nabble.com.

Re: Different regex-urlfilter for different file types in nutch

Reply via email to