Hi Andy, This is really useful and I could imagine it was a right pain to debug. I think it is up to you regarding opening of tickets... as you say this sounds as if it is a common aspect of the way config is read from nutch-site.xml
If you come acorss any obvious ones which you think you could patch (and you have time :)) then by all means patch them, the contribs would be great. Thanks Lewis On Tue, Jun 12, 2012 at 7:25 AM, Andy Xue <[email protected]> wrote: > Hi all: > > Like I suspected, this vulnerability affects more properties apart from the > ones I described in NUTCH-1385. > For instance, the property "plugin.includes": > > <value>plugin_1|plugin_2</value> > This is fine, it will load both plugins. > > <value>plugin_1|plugin_2 > </value> > This is not fine since (I guess) the program will try to find a plugin > named "plugin_2\n" (maybe not precise, but you get the idea). > > I've been debugging for this bug for hours and finally found it. The cause > is that my editor automatically formats long line by splitting it into > multiple lines. > > So the rule here is: no matter how long a property value is, do not spread > it into multiple lines. Otherwise something unexpected will happen. > > At this point, I'm not sure whether I should submit another ticket because > I don't know exactly which properties are affected by this problem. Just a > heads up for all of you who might encounter the same problem in the future. > > Regards > Andy > > > On 9 June 2012 11:42, Andy Xue <[email protected]> wrote: > >> Hi Lewis: >> >> Sorry for the delay. Sure, I'll open a ticket in a bit. >> >> Regards >> Andy >> >> >> >> On 7 June 2012 21:28, Lewis John Mcgibbney <[email protected]>wrote: >> >>> Hi Andy, >>> Even opening a ticket and getting it logged would b great. >>> Thanks >>> Lewis >>> >>> On Wed, Jun 6, 2012 at 3:53 AM, Andy Xue <[email protected]> wrote: >>> > Hi Lewis: >>> > >>> > I'll try to find a time to do it. Thanks for the reply. >>> > >>> > Regards >>> > Andy >>> > >>> > >>> > >>> > On 31 May 2012 20:37, Lewis John Mcgibbney <[email protected] >>> >wrote: >>> > >>> >> Hi Andy, >>> >> >>> >> This is a good catch and I would suggest you open an issue on the Jira >>> >> and submit a patch for the few instances of where this actually >>> >> occurs... e.g. I think there are currently 4 such instances in >>> >> nutch-default which concern the ordering of such tools. Admittedly >>> >> though I haven't dug down into the code to see if it is consistent as >>> >> you assume... >>> >> >>> >> If you begin by investigating (and patching if necessary) these parts >>> >> then this would make a nice patch. As you are using trunk, I wouldn't >>> >> imagine it would take you too long. >>> >> >>> >> Thanks very much >>> >> >>> >> Lewis >>> >> >>> >> On Thu, May 31, 2012 at 2:34 AM, Andy Xue <[email protected]> >>> wrote: >>> >> > Hi all: >>> >> > >>> >> > The following situation has come to my attention regarding >>> >> "*nutch-site.xml*" >>> >> > when I'm using nutch trunk: >>> >> > When listing multiple scoring filters in the property >>> >> "*scoring.filter.order >>> >> > *", it is vital that no spaces/newlines/tabs are placed in front of >>> the >>> >> > first value. E.g.: >>> >> > This is fine: >>> >> > <value>org.apache.nutch.scoring.opic.OPICScoringFilter >>> myFilter</value> >>> >> > >>> >> > Either of these will generate an exception: >>> >> > <value> org.apache.nutch.scoring.opic.OPICScoringFilter >>> myFilter</value> >>> >> > <value> >>> >> > org.apache.nutch.scoring.opic.OPICScoringFilter >>> >> > myFilter >>> >> > </value> >>> >> > >>> >> > The reason is: In *org.apache.nutch.scoring.ScoringFilters*, a >>> statement >>> >> > (on line 59) "orderedFilters = order.split("\\s+");" tries to split >>> the >>> >> > aforementioned string. The leading spaces will cause an empty >>> separate >>> >> > array element as the first element, hence result in a ClassNotFound / >>> >> > NullPointer exception. >>> >> > >>> >> > >>> >> > It can be easily fixed of course, but what concerns me is that I >>> suspect >>> >> > the fact that other properties will have the same problem (i.e., must >>> >> have >>> >> > the value content immediately follow the *<value>* tag. This is not >>> >> > considered robust. >>> >> > >>> >> > Any thoughts? >>> >> > >>> >> > Regards >>> >> > Andy >>> >> >>> >> >>> >> >>> >> -- >>> >> Lewis >>> >> >>> >>> >>> >>> -- >>> Lewis >>> >> >> -- Lewis

