Hi,
Do you have any idea about how can I determine file type in RegexUrlFilter?
file type is distinguishable at parse time not at url filter extension
point. For example you can manage to use different parser for different
mimetype in parse-plugins.xml. But how can I manage same behavior at url
filter extension point?

Best regards.


On Tue, Aug 19, 2014 at 6:48 AM, feng lu <[email protected]> wrote:

> Hi
>
> Do you want to set different type of rules to different type of files? I
> find regex-urlfilter plugin did not provide this feature and other
> *-urlfilter plugins also did not provide this feature.
>
> Maybe you can add a interface like
>
> protected Reader[] getRulesReaders(Configuration conf) throws IOException
>
> to get multi-readers for all configure files in RegexURLFilterBase class.
>
>
> On Tue, Aug 19, 2014 at 1:42 AM, Ali Nazemian <[email protected]>
> wrote:
>
> > Dear all,
> > Hi,
> > I use nutch 1.8 for crawl some web sites. For this purpose I want to
> change
> > nutch in a way that different regex-urlfilter file loads for different
> > types of file. For example one for html files and another for image
> files.
> > (jpg/jpeg, ... ) Does nutch consider such situation? Or I should change
> > some line of codes? (probably regex-urlfilter plugin)
> > Best regards.
> >
> > --
> > A.Nazemian
> >
>
>
>
> --
> Don't Grow Old, Grow Up... :-)
>



-- 
A.Nazemian

Reply via email to