Re: Please help: Tomcat problem, Paginating with optimization (Like goggle)

Piotr Kosiorowski Tue, 24 May 2005 02:18:35 -0700

Hi Ferenc,

'bin/nutch segread -list'  reports number of entries  in fetcher
output - so if the data  is not corrupted - it should report total
number of entries generated during fetchlist generation. luke on the
other hand reports number of documents in lucene index - so it will
include all pages that were correctly processed - so it will not
report all pages that where not fetched because of errors or pages
that were not parsed succesfully etc.  And this is the number returned
when you search for "http" because only correctly indexed pages are
searchable.
Regards
Piotr


On 5/24/05, [EMAIL PROTECTED] <[EMAIL PROTECTED]> wrote:
> Dear Chirag and Byron,
> 
> Thanks for suggestion, but I don't have any problem with other
> applications under Tomcat. Problem is occured with only nutch.
> There is free version of Resin, this is truly better than Tomcat?
> 
> Dear Chirag, You wrotte that, put 1G memory / 1 million pages to the
> backend.
> How to calculate the pages number in the segments?
> If I use the 'bin/nutch segread -list' tool this is say a segment there
> are 500000 pages in it.
> If I use 'lukeall.jar' tool it is say there are 420105 records in that
> segment.
> If I use 'lukeall.jar' undelete function, there are 438000 records in
> the same segments.
> If I use websearch engine with searching for 'http', this says equal to
> 'lukeall.jar'.
> 
> What number to use to calculate pages / backend?
> 
> I think my solution of the 'paginating' is better than reported others.
> Any comment?
> 
> Thanks, Ferenc
>

Re: Please help: Tomcat problem, Paginating with optimization (Like goggle)

Reply via email to