So, during my crawl I get entries like this in the crawl log as a result of the parsing:
http://www.domain.comhttp/www.domain.com/news/articles/2012-03-11/201205101336665761902.html http://www.domain.comhttp/www.domain.com/news/articles/2012-04-24/201205101336663435768.html The fetches fail, obviously, with: fetch of http://www.domain.comhttp/www.domain.com/news/articles/2012-04-24/201205101336663435768.html failed with: java.net.UnknownHostException: www.domain.comhttp I'm not sure if the prepension of the domain is related to incorrectly parsing http://, but the site's code seems to be sound. Has anyone else seen this behavior? To help troubleshoot it, I'm trying to dump the inlinks to the these pages, but I'm struggling for the command to do that. Any help would be appreciated. Thanks everyone!

