well, these are all details. The bigger question is, how to seperate the
crawling policy of site A from that of site B?

On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak <[email protected]>wrote:

> You probably need to customize parse-metatags plugin.
>
> I think you go ahead and include all possible metatags. And take care of
> missing metatags in solr.
>
> On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang <[email protected]> wrote:
>
> > I understand conf/regex-urlfilter.txt; I can put domain names into the
> URL
> > patterns.
> >
> > But what about meta tags? What if I want to parse out different meta tags
> > for different sites?
> >
> > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak <
> [email protected]
> > >wrote:
> >
> > > 1) For parsing & indexing customized meta tags enable & configure
> plugin
> > > "parse-metatags"
> > >
> > > 2) There are several filters of url, like regex based. For regex, the
> > > patterns are specified via conf/regex-urlfilter.txt
> > >
> > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil <[email protected]
> > > >wrote:
> > >
> > > > While defining url patterns, have the domain name in it so that you
> get
> > > > site/domain specific rules. I don't know about configuring meta tags.
> > > >
> > > > Thanks,
> > > > Tejas
> > > >
> > > >
> > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang <[email protected]>
> > > wrote:
> > > >
> > > > > How to enforce site-specific crawling policies, i.e, different URL
> > > > > patterns, meta tags, etc. for different websites to be crawled? I
> got
> > > the
> > > > > sense that multiple instances of nutch are needed? Is it correct?
> If
> > > yes,
> > > > > how?
> > > > >
> > > >
> > >
> >
>

Reply via email to