well, these are all details. The bigger question is, how to seperate the crawling policy of site A from that of site B?
On Thu, Nov 15, 2012 at 7:41 AM, Sourajit Basak <[email protected]>wrote: > You probably need to customize parse-metatags plugin. > > I think you go ahead and include all possible metatags. And take care of > missing metatags in solr. > > On Thu, Nov 15, 2012 at 12:22 AM, Joe Zhang <[email protected]> wrote: > > > I understand conf/regex-urlfilter.txt; I can put domain names into the > URL > > patterns. > > > > But what about meta tags? What if I want to parse out different meta tags > > for different sites? > > > > On Wed, Nov 14, 2012 at 1:33 AM, Sourajit Basak < > [email protected] > > >wrote: > > > > > 1) For parsing & indexing customized meta tags enable & configure > plugin > > > "parse-metatags" > > > > > > 2) There are several filters of url, like regex based. For regex, the > > > patterns are specified via conf/regex-urlfilter.txt > > > > > > On Wed, Nov 14, 2012 at 1:33 PM, Tejas Patil <[email protected] > > > >wrote: > > > > > > > While defining url patterns, have the domain name in it so that you > get > > > > site/domain specific rules. I don't know about configuring meta tags. > > > > > > > > Thanks, > > > > Tejas > > > > > > > > > > > > On Tue, Nov 13, 2012 at 11:34 PM, Joe Zhang <[email protected]> > > > wrote: > > > > > > > > > How to enforce site-specific crawling policies, i.e, different URL > > > > > patterns, meta tags, etc. for different websites to be crawled? I > got > > > the > > > > > sense that multiple instances of nutch are needed? Is it correct? > If > > > yes, > > > > > how? > > > > > > > > > > > > > > >

