Andrzej Bialecki wrote:
Gal Nitzan wrote:
Andrzej Bialecki wrote:

Hi all,

Well I still get a very slow mergesegs:



>050917 043332 - data in segment index/segments/20050916014401 is corrupt, using only 128115 entries.


This is a common and recurring problem. What's worse is that an unfixed segment like this will destroy the performance of the search, too, not just the backend pre-processing.

I propose to modify MapFile.Reader so that it refuses to open such file, and throws an Exception, unless a force=true flag is given. Tools that want to ignore this can do so, but all other tools will be able to make a conscious decision whether to fix it first, or to use it as such.

If there are no objections, I will change it in the trunk/ in a couple of days.

Hi,

I think it would be very confusing to old users as well as new users. Throwing an exception when actually a segment corruption is trivial and can be fixed easily (now that I know how to do that :-)...

You missed my point - I proposed that we change the API. On the surface, command-line tools would behave like now, with the benefit that segment corruption would be fixed automatically by those tools that require clean segments - unless _prevented_ by a cmd-line switch. So, this is just to improve the default behaviour, and not to complain even louder than now.


Instead I would like to suggest building a FAQ for Nutch.

I would like to propose myself  to build at least the skeleton for it.

As a new user to Nutch I have run to so many problems and except this list there was not much information elsewhere. So, I have all the answers fresh in my mind and with some help from the rest of the nutch-users it can be done without too much of a hustle.

Besides, many people on this list contribute on their free time, I would be happy to contribute to the success of this project.

This is always welcome, and there is already a place where we collect such info. Please see the Nutch Wiki, and feel free to enhance or add new content there.

You are right, I did miss your point. And now that I understand :-) I think it is a very good Idea.

Yes I found the FAQ hiding in the wiki and I have started working on it.

Gal


-------------------------------------------------------
SF.Net email is sponsored by:
Tame your development challenges with Apache's Geronimo App Server. Download it for free - -and be entered to win a 42" plasma tv or your very
own Sony(tm)PSP.  Click here to play: http://sourceforge.net/geronimo.php
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to