[ http://issues.apache.org/jira/browse/NUTCH-274?page=all ]
Andrzej Bialecki closed NUTCH-274. ----------------------------------- Fix Version/s: 0.8.2 0.9.0 Resolution: Fixed Assignee: Andrzej Bialecki This bug has been fixed in recent versions of Hadoop. > Empty row in/at end of URL-list results in error > ------------------------------------------------ > > Key: NUTCH-274 > URL: http://issues.apache.org/jira/browse/NUTCH-274 > Project: Nutch > Issue Type: Bug > Affects Versions: 0.8 > Environment: nightly-2006-05-20 > Reporter: Stefan Neufeind > Assigned To: Andrzej Bialecki > Priority: Minor > Fix For: 0.8.2, 0.9.0 > > Attachments: ignoreEmpthyLineDuringInjectV1.patch > > > This is minor - but it's a little unclean :-) > Reproduce: Have a URL-file with one URL followed by a newline, thus producing > an empty line. > Outcome: Fetcher-threads try to fetch two URLs at the same time. First one is > fine - but second is empty and therefor fails proper protocol-detection. > 60521 022639 Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer) > 060521 022639 Nutch Query Filter (org.apache.nutch.searcher.QueryFilter) > 060521 022639 found resource parse-plugins.xml at > file:/home/mm/nutch-nightly/conf/parse-plugins.xml > 060521 022639 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer > 060521 022639 fetching http://www.bild.de/ > 060521 022639 fetching > 060521 022639 fetch of failed with: > org.apache.nutch.protocol.ProtocolNotFound: java.net.MalformedURLException: > no protocol: > 060521 022639 http.proxy.host = null > 060521 022639 http.proxy.port = 8080 > 060521 022639 http.timeout = 10000 > 060521 022639 http.content.limit = 65536 > 060521 022639 http.agent = NutchCVS/0.8-dev (Nutch; > http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org) > 060521 022639 fetcher.server.delay = 1000 > 060521 022639 http.max.delays = 1000 > 060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.text.TextParser > mapped to contentType text/xml via parse-plugins.xml, but > its plugin.xml file does not claim to support contentType: text/xml > 060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser > mapped to contentType text/xml via parse-plugins.xml, but > its plugin.xml file does not claim to support contentType: text/xml > 060521 022640 ParserFactory: Plugin: org.apache.nutch.parse.rss.RSSParser > mapped to contentType text/xml via parse-plugins.xml, but > not enabled via plugin.includes in nutch-default.xml > 060521 022640 Using Signature impl: org.apache.nutch.crawl.MD5Signature > 060521 022640 map 0% reduce 0% > 060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s, > 060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s, -- This message is automatically generated by JIRA. - If you think it was sent incorrectly contact one of the administrators: http://issues.apache.org/jira/secure/Administrators.jspa - For more information on JIRA, see: http://www.atlassian.com/software/jira ------------------------------------------------------------------------- Take Surveys. Earn Cash. Influence the Future of IT Join SourceForge.net's Techsay panel and you'll get the chance to share your opinions on IT & business topics through brief surveys - and earn cash http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV _______________________________________________ Nutch-developers mailing list Nutch-developers@lists.sourceforge.net https://lists.sourceforge.net/lists/listinfo/nutch-developers