[ http://issues.apache.org/jira/browse/NUTCH-277?page=comments#action_12412706 ]
Stefan Neufeind commented on NUTCH-277: --------------------------------------- Problem was reproducable with the URL-set we had here. After moving from protocol-httpclient to protocol-http the problem is gone, crawling is fine. Could there be a problem in httpclient-interface, maybe with redirects? PS: Too bad we're missing https-support for now - but it works for the moment ... > Fetcher dies because of "max. redirects" (avoiding infinite loop) > ----------------------------------------------------------------- > > Key: NUTCH-277 > URL: http://issues.apache.org/jira/browse/NUTCH-277 > Project: Nutch > Type: Bug > Components: fetcher > Versions: 0.8-dev > Environment: nightly-2006-05-20 > Reporter: Stefan Neufeind > Priority: Critical > > Error in the logs is: > 060521 213401 SEVERE Narrowly avoided an infinite loop in execute > org.apache.commons.httpclient.RedirectException: Maximum redirects (100) > exceeded > at > org.apache.commons.httpclient.HttpMethodDirector.executeMethod(HttpMethodDirector.java:183) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:396) > at > org.apache.commons.httpclient.HttpClient.executeMethod(HttpClient.java:324) > at > org.apache.nutch.protocol.httpclient.HttpResponse.<init>(HttpResponse.java:87) > at org.apache.nutch.protocol.httpclient.Http.getResponse(Http.java:97) > at > org.apache.nutch.protocol.http.api.RobotRulesParser.isAllowed(RobotRulesParser.java:394) > at > org.apache.nutch.protocol.http.api.HttpBase.getProtocolOutput(HttpBase.java:173) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:135) > This happens during normal crawling. Unfortunately I don't know how to > further track this down. But it's problematic, since it actually makes the > fetcher die. > Workaround (for the symptom) is in NUTCH-258 (avoid dying on SEVERE > logentry). That works for me, crawling works fine and it does not hang/crash. > However this is working around the problems not solving them - I know. But > it helps for the moment ... > Hope somebody can help - this loops quite important to track down to me. -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-developers mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-developers
