One last thing here. The reasoning behind me snooping around in the
Generator code is that I was wanting to add sufficient configuration
to catch MalformedURL's in filtering and normalising stages... in
Nutchgora this is now the case in the Generator Mapper, however in
trunk currently it appears that we do check for Malformed URL's in the
reducer phase but do not throw a MalformedURLException when we find
one, instead opting to throw a vanilla Exception...

Thanks

On Mon, May 21, 2012 at 7:12 PM, Lewis John Mcgibbney
<[email protected]> wrote:
> The normalise logic is the in trunk generator reducer...
>
> Can someone please explain the differences in logic to me.
>
> Thank you very much.
>
> Lewis
>
> On Mon, May 21, 2012 at 7:06 PM, Lewis John Mcgibbney
> <[email protected]> wrote:
>> Hi,
>>
>> When working on some patches for both trunk and Nutchgora branch I
>> ended up doing some code analysis of the generator mappers [0] & [1]
>> respectively. With specific reference to the code blocks in trunk
>> (lines 175 - 185) and Nutchgora branch (lines 57 - 74) where in trunk
>> we initially check if filter is true whereas in Nutchgora we check
>> whether normalize is true, then check whether filter is true before
>> proceeding to catch any nasties... it seems to me that there may be a
>> bug in trunk but I am not sure and would like someone to comment.
>>
>> Thanks
>>
>> Lewis
>>
>> [0] 
>> https://svn.apache.org/viewvc/nutch/trunk/src/java/org/apache/nutch/crawl/Generator.java?view=markup
>> [1] 
>> https://svn.apache.org/viewvc/nutch/branches/nutchgora/src/java/org/apache/nutch/crawl/GeneratorMapper.java?view=markup
>>
>> --
>> Lewis
>
>
>
> --
> Lewis



-- 
Lewis

Reply via email to