I understand conf/regex-urlfilter.txt; I can put domain names into the URL
patterns.

But what about meta tags? What if I want to parse out different meta tags
for different sites?

On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak <[email protected]>wrote:

> 1) For parsing & indexing customized meta tags enable & configure plugin
> "parse-metatags"
>
> 2) There are several filters of url, like regex based. For regex, the
> patterns are specified via conf/regex-urlfilter.txt
>
> On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil <[email protected]
> >wrote:
>
> > While defining url patterns, have the domain name in it so that you get
> > site/domain specific rules. I don't know about configuring meta tags.
> >
> > Thanks,
> > Tejas
> >
> >
> > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang <[email protected]>
> wrote:
> >
> > > How to enforce site-specific crawling policies, i.e, different URL
> > > patterns, meta tags, etc. for different websites to be crawled? I got
> the
> > > sense that multiple instances of nutch are needed? Is it correct? If
> yes,
> > > how?
> > >
> >
>

Reply via email to