Gal Nitzan wrote:
Andrzej Bialecki wrote:
Hi all,
Well I still get a very slow mergesegs:
>050917 043332 - data in segment index/segments/20050916014401 is
corrupt, using only 128115 entries.
This is a common and recurring problem. What's worse is that an
unfixed segment like this will destroy the performance of the search,
too, not just the backend pre-processing.
I propose to modify MapFile.Reader so that it refuses to open such
file, and throws an Exception, unless a force=true flag is given.
Tools that want to ignore this can do so, but all other tools will be
able to make a conscious decision whether to fix it first, or to use
it as such.
If there are no objections, I will change it in the trunk/ in a couple
of days.
Hi,
I think it would be very confusing to old users as well as new users.
Throwing an exception when actually a segment corruption is trivial and
can be fixed easily (now that I know how to do that :-)...
You missed my point - I proposed that we change the API. On the surface,
command-line tools would behave like now, with the benefit that segment
corruption would be fixed automatically by those tools that require
clean segments - unless _prevented_ by a cmd-line switch. So, this is
just to improve the default behaviour, and not to complain even louder
than now.
Instead I would like to suggest building a FAQ for Nutch.
I would like to propose myself to build at least the skeleton for it.
As a new user to Nutch I have run to so many problems and except this
list there was not much information elsewhere. So, I have all the
answers fresh in my mind and with some help from the rest of the
nutch-users it can be done without too much of a hustle.
Besides, many people on this list contribute on their free time, I would
be happy to contribute to the success of this project.
This is always welcome, and there is already a place where we collect
such info. Please see the Nutch Wiki, and feel free to enhance or add
new content there.
--
Best regards,
Andrzej Bialecki <><
___. ___ ___ ___ _ _ __________________________________
[__ || __|__/|__||\/| Information Retrieval, Semantic Web
___|||__|| \| || | Embedded Unix, System Integration
http://www.sigram.com Contact: info at sigram dot com
-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server.
Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP. Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general