Thanks Faruk.

So I wonder why the 1.1 release use TIKA which seems not to be stable at the
moment. Any ideas?


On Sun, Jul 18, 2010 at 7:21 AM, Faruk Berksöz <[email protected]> wrote:

>
> There is an open issue
> (NUTCH-817<https://issues.apache.org/jira/browse/NUTCH-817>)
> that can related with your problem !!
>
> 2010/7/16 jeff-4 [via Lucene]
> <[email protected]<ml-node%[email protected]>
> <ml-node%[email protected]<ml-node%[email protected]>
> >
> >
>
> > I did check. Nutch 1.0 crawled over 300 links while Nutch 1.1 only 2.
> >
> > On Fri, 2010-07-16 at 14:21 +0800, xiao yang wrote:
> >
> > > You can use “bin/nutch readdb crawl/crawldb -stats” to check the
> > > number of pages they crawled.
> > >
> > > On Fri, Jul 16, 2010 at 2:07 PM, jeff <[hidden email]<
> http://user/SendEmail.jtp?type=node&node=971632&i=0>>
> > wrote:
> > > > Hi,
> > > >
> > > > I am testing nutch 1.1 with the exactly same configuration as that
> > > > tested on nutch 1.0. It has taken 1.0 to crawl the bestbuy site by a
> > few
> > > > hours, while it only takes 2-3 minutes for 1.1. Does anyone have the
> > > > similar experience and know why?
> > > >
> > > > Thanks.
> > > >
> > > >
> >
> >
> >
> >
> > ------------------------------
> >  View message @
> >
> http://lucene.472066.n3.nabble.com/Nutch-1-1-crawls-fewer-links-than-1-0-tp971589p971632.html
> > To unsubscribe from Nutch, click here< (link removed) >.
> >
> >
> >
>
> --
> View this message in context:
> http://lucene.472066.n3.nabble.com/Nutch-1-1-crawls-fewer-links-than-1-0-tp971589p976259.html
> Sent from the Nutch - User mailing list archive at Nabble.com.
>

Reply via email to