[EMAIL PROTECTED] wrote:
Dear List,
How to determine: How many real (indexed, not deleted) pages are in a
segment?
I think if we have some backends, we need to balance the segments
between them.
I firstly try the fetched number of pages, but this is not real balance.
I used the lukeall.jar tool on my winxp client, but on the servers can't
run graphical interfaces.
You can use two tools:
1. nutch segread -list : this will give you the total number of records
in a segment. Note, however, that this includes also pages which failed
to be fetched or parsed.
2. You can use LuCli (in lucene/contrib) for a command-line frontend to
Lucene.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
This SF.Net email is sponsored by the 'Do More With Dual!' webinar happening
July 14 at 8am PDT/11am EDT. We invite you to explore the latest in dual
core and dual graphics technology at this free one hour event hosted by HP,
AMD, and NVIDIA. To register visit http://www.hp.com/go/dualwebinar
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general