Hello,
I started a bigger fetch using latest HTMLParser code and got some NullPointer exceptions from HTMLParser class (line 201)

        if (metaTags.getRefresh()) {
              status.setMinorCode(ParseStatus.SUCCESS_REDIRECT);
201:           status.setMessage(metaTags.getRefreshHref().toString());
         }

Example of problematic URL: http://calgary.foundlocally.com/Travel/Attr-CityWalks.htm
The problem is with meta tags:
Meta tags for http://calgary.foundlocally.com/Travel/Attr-CityWalks.htm: base=null, noCache=false, noFollow=false, noIndex=false, refresh=true, refreshHref=null

So it simply has empty refreshHref so some check before invoking toString() is needed. I am not sure what should be done here right now - but probably this case should not be handled as redirect.
Regards
Piotr






-------------------------------------------------------
This SF.Net email is sponsored by: NEC IT Guy Games.  How far can you shotput
a projector? How fast can you ride your desk chair down the office luge track?
If you want to score the big prize, get to know the little guy. Play to win an NEC 61" plasma display: http://www.necitguy.com/?r=20
_______________________________________________
Nutch-developers mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to