You probably need to customize parse-metatags plugin.

I think you go ahead and include all possible metatags. And take care of
missing metatags in solr.

On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang <[email protected]> wrote:

> I understand conf/regex-urlfilter.txt; I can put domain names into the URL
> patterns.
>
> But what about meta tags? What if I want to parse out different meta tags
> for different sites?
>
> On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak <[email protected]
> >wrote:
>
> > 1) For parsing & indexing customized meta tags enable & configure plugin
> > "parse-metatags"
> >
> > 2) There are several filters of url, like regex based. For regex, the
> > patterns are specified via conf/regex-urlfilter.txt
> >
> > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil <[email protected]
> > >wrote:
> >
> > > While defining url patterns, have the domain name in it so that you get
> > > site/domain specific rules. I don't know about configuring meta tags.
> > >
> > > Thanks,
> > > Tejas
> > >
> > >
> > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang <[email protected]>
> > wrote:
> > >
> > > > How to enforce site-specific crawling policies, i.e, different URL
> > > > patterns, meta tags, etc. for different websites to be crawled? I got
> > the
> > > > sense that multiple instances of nutch are needed? Is it correct? If
> > yes,
> > > > how?
> > > >
> > >
> >
>

Reply via email to