Hi Andy,

This is really useful and I could imagine it was a right pain to debug.
I think it is up to you regarding opening of tickets... as you say
this sounds as if it is a common aspect of the way config is read from
nutch-site.xml

If you come acorss any obvious ones which you think you could patch
(and you have time :)) then by all means patch them, the contribs
would be great.

Thanks

Lewis

On Tue, Jun 12, 2012 at 7:25 AM, Andy Xue <[email protected]> wrote:
> Hi all:
>
> Like I suspected, this vulnerability affects more properties apart from the
> ones I described in NUTCH-1385.
> For instance, the property "plugin.includes":
>
>      <value>plugin_1|plugin_2</value>
> This is fine, it will load both plugins.
>
>      <value>plugin_1|plugin_2
>      </value>
> This is not fine since (I guess) the program will try to find a plugin
> named "plugin_2\n" (maybe not precise, but you get the idea).
>
> I've been debugging for this bug for hours and finally found it. The cause
> is that my editor automatically formats long line by splitting it into
> multiple lines.
>
> So the rule here is: no matter how long a property value is, do not spread
> it into multiple lines. Otherwise something unexpected will happen.
>
> At this point, I'm not sure whether I should submit another ticket because
> I don't know exactly which properties are affected by this problem. Just a
> heads up for all of you who might encounter the same problem in the future.
>
> Regards
> Andy
>
>
> On 9 June 2012 11:42, Andy Xue <[email protected]> wrote:
>
>> Hi Lewis:
>>
>> Sorry for the delay. Sure, I'll open a ticket in a bit.
>>
>> Regards
>> Andy
>>
>>
>>
>> On 7 June 2012 21:28, Lewis John Mcgibbney <[email protected]>wrote:
>>
>>> Hi Andy,
>>> Even opening a ticket and getting it logged would b great.
>>> Thanks
>>> Lewis
>>>
>>> On Wed, Jun 6, 2012 at 3:53 AM, Andy Xue <[email protected]> wrote:
>>> > Hi Lewis:
>>> >
>>> > I'll try to find a time to do it. Thanks for the reply.
>>> >
>>> > Regards
>>> > Andy
>>> >
>>> >
>>> >
>>> > On 31 May 2012 20:37, Lewis John Mcgibbney <[email protected]
>>> >wrote:
>>> >
>>> >> Hi Andy,
>>> >>
>>> >> This is a good catch and I would suggest you open an issue on the Jira
>>> >> and submit a patch for the few instances of where this actually
>>> >> occurs... e.g. I think there are currently 4 such instances in
>>> >> nutch-default which concern the ordering of such tools. Admittedly
>>> >> though I haven't dug down into the code to see if it is consistent as
>>> >> you assume...
>>> >>
>>> >> If you begin by investigating (and patching if necessary) these parts
>>> >> then this would make a nice patch. As you are using trunk, I wouldn't
>>> >> imagine it would take you too long.
>>> >>
>>> >> Thanks very much
>>> >>
>>> >> Lewis
>>> >>
>>> >> On Thu, May 31, 2012 at 2:34 AM, Andy Xue <[email protected]>
>>> wrote:
>>> >> > Hi all:
>>> >> >
>>> >> > The following situation has come to my attention regarding
>>> >> "*nutch-site.xml*"
>>> >> > when I'm using nutch trunk:
>>> >> > When listing multiple scoring filters in the property
>>> >> "*scoring.filter.order
>>> >> > *", it is vital that no spaces/newlines/tabs are placed in front of
>>> the
>>> >> > first value. E.g.:
>>> >> > This is fine:
>>> >> > <value>org.apache.nutch.scoring.opic.OPICScoringFilter
>>> myFilter</value>
>>> >> >
>>> >> > Either of these will generate an exception:
>>> >> > <value> org.apache.nutch.scoring.opic.OPICScoringFilter
>>> myFilter</value>
>>> >> > <value>
>>> >> > org.apache.nutch.scoring.opic.OPICScoringFilter
>>> >> > myFilter
>>> >> > </value>
>>> >> >
>>> >> > The reason is: In *org.apache.nutch.scoring.ScoringFilters*, a
>>> statement
>>> >> > (on line 59) "orderedFilters = order.split("\\s+");" tries to split
>>> the
>>> >> > aforementioned string. The leading spaces will cause an empty
>>> separate
>>> >> > array element as the first element, hence result in a ClassNotFound /
>>> >> > NullPointer exception.
>>> >> >
>>> >> >
>>> >> > It can be easily fixed of course, but what concerns me is that I
>>> suspect
>>> >> > the fact that other properties will have the same problem (i.e., must
>>> >> have
>>> >> > the value content immediately follow the *<value>* tag. This is not
>>> >> > considered robust.
>>> >> >
>>> >> > Any thoughts?
>>> >> >
>>> >> > Regards
>>> >> > Andy
>>> >>
>>> >>
>>> >>
>>> >> --
>>> >> Lewis
>>> >>
>>>
>>>
>>>
>>> --
>>> Lewis
>>>
>>
>>



-- 
Lewis

Reply via email to