Re: Nutch 1.1 crawls fewer links than 1.0

jeff Sun, 18 Jul 2010 10:24:12 -0700

Faruk, do you know how tika overrides the other parsers? I found in the
parse-plugins.xml many of previously working parsers have been
commented. Does tika override each of the parsers on the fly or is there
some configuration that Nutch checks first before picking up tika?


Thanks


On Sun, 2010-07-18 at 04:21 -0700, Faruk Berksöz wrote:
> There is an open issue
> (NUTCH-817<https://issues.apache.org/jira/browse/NUTCH-817>)
> that can related with your problem !!
> 
> 2010/7/16 jeff-4 [via Lucene]
> <[email protected]<ml-node%[email protected]>
> >
> 
> > I did check. Nutch 1.0 crawled over 300 links while Nutch 1.1 only 2.
> >
> > On Fri, 2010-07-16 at 14:21 +0800, xiao yang wrote:
> >
> > > You can use “bin/nutch readdb crawl/crawldb -stats” to check the
> > > number of pages they crawled.
> > >
> > > On Fri, Jul 16, 2010 at 2:07 PM, jeff <[hidden 
> > > email]<http://user/SendEmail.jtp?type=node&node=971632&i=0>>
> > wrote:
> > > > Hi,
> > > >
> > > > I am testing nutch 1.1 with the exactly same configuration as that
> > > > tested on nutch 1.0. It has taken 1.0 to crawl the bestbuy site by a
> > few
> > > > hours, while it only takes 2-3 minutes for 1.1. Does anyone have the
> > > > similar experience and know why?
> > > >
> > > > Thanks.
> > > >
> > > >
> >
> >
> >
> >
> > ------------------------------
> >  View message @
> > http://lucene.472066.n3.nabble.com/Nutch-1-1-crawls-fewer-links-than-1-0-tp971589p971632.html
> > To unsubscribe from Nutch, click here< (link removed) >.
> >
> >
> >
>

Re: Nutch 1.1 crawls fewer links than 1.0

Reply via email to