One last thing here. The reasoning behind me snooping around in the Generator code is that I was wanting to add sufficient configuration to catch MalformedURL's in filtering and normalising stages... in Nutchgora this is now the case in the Generator Mapper, however in trunk currently it appears that we do check for Malformed URL's in the reducer phase but do not throw a MalformedURLException when we find one, instead opting to throw a vanilla Exception...
Thanks On Mon, May 21, 2012 at 7:12 PM, Lewis John Mcgibbney <[email protected]> wrote: > The normalise logic is the in trunk generator reducer... > > Can someone please explain the differences in logic to me. > > Thank you very much. > > Lewis > > On Mon, May 21, 2012 at 7:06 PM, Lewis John Mcgibbney > <[email protected]> wrote: >> Hi, >> >> When working on some patches for both trunk and Nutchgora branch I >> ended up doing some code analysis of the generator mappers [0] & [1] >> respectively. With specific reference to the code blocks in trunk >> (lines 175 - 185) and Nutchgora branch (lines 57 - 74) where in trunk >> we initially check if filter is true whereas in Nutchgora we check >> whether normalize is true, then check whether filter is true before >> proceeding to catch any nasties... it seems to me that there may be a >> bug in trunk but I am not sure and would like someone to comment. >> >> Thanks >> >> Lewis >> >> [0] >> https://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?view=markup >> [1] >> https://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorMapper.java?view=markup >> >> -- >> Lewis > > > > -- > Lewis -- Lewis

