Hi all:

The following situation has come to my attention regarding "*nutch-site.xml*"
when I'm using nutch trunk:
When listing multiple scoring filters in the property "*scoring.filter.order
*", it is vital that no spaces/newlines/tabs are placed in front of the
first value. E.g.:
This is fine:
<value>org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>

Either of these will generate an exception:
<value> org.apache.nutch.scoring.opic.OPICScoringFilter myFilter</value>
<value>
org.apache.nutch.scoring.opic.OPICScoringFilter
myFilter
</value>

The reason is: In *org.apache.nutch.scoring.ScoringFilters*, a statement
(on line 59) "orderedFilters = order.split("\\s+");" tries to split the
aforementioned string. The leading spaces will cause an empty separate
array element as the first element, hence result in a ClassNotFound /
NullPointer exception.


It can be easily fixed of course, but what concerns me is that I suspect
the fact that other properties will have the same problem (i.e., must have
the value content immediately follow the *<value>* tag. This is not
considered robust.

Any thoughts?

Regards
Andy

Reply via email to