yes, you can not determine the type of file like parser. But I think there are two methods you can determine the type of file. One is through url resource suffix and other is use Head Request to get the Content-Type of that resource but this method will take long time that the first method.
but one question confused me is that how do you classify different regex-urlfilter file? I think you store different regex in different files. So before you classify there regex string you have already know the which url belong to which regex-urlfilter file. :) So this is a question? On Mon, Aug 25, 2014 at 10:27 PM, Ali Nazemian <[email protected]> wrote: > Hi, > Do you have any idea about how can I determine file type in RegexUrlFilter? > file type is distinguishable at parse time not at url filter extension > point. For example you can manage to use different parser for different > mimetype in parse-plugins.xml. But how can I manage same behavior at url > filter extension point? > > Best regards. > > > On Tue, Aug 19, 2014 at 6:48 AM, feng lu <[email protected]> wrote: > > > Hi > > > > Do you want to set different type of rules to different type of files? I > > find regex-urlfilter plugin did not provide this feature and other > > *-urlfilter plugins also did not provide this feature. > > > > Maybe you can add a interface like > > > > protected Reader[] getRulesReaders(Configuration conf) throws IOException > > > > to get multi-readers for all configure files in RegexURLFilterBase class. > > > > > > On Tue, Aug 19, 2014 at 1:42 AM, Ali Nazemian <[email protected]> > > wrote: > > > > > Dear all, > > > Hi, > > > I use nutch 1.8 for crawl some web sites. For this purpose I want to > > change > > > nutch in a way that different regex-urlfilter file loads for different > > > types of file. For example one for html files and another for image > > files. > > > (jpg/jpeg, ... ) Does nutch consider such situation? Or I should change > > > some line of codes? (probably regex-urlfilter plugin) > > > Best regards. > > > > > > -- > > > A.Nazemian > > > > > > > > > > > -- > > Don't Grow Old, Grow Up... :-) > > > > > > -- > A.Nazemian > -- Don't Grow Old, Grow Up... :-)

