[ http://issues.apache.org/jira/browse/NUTCH-274?page=all ]

Andrzej Bialecki  closed NUTCH-274.
-----------------------------------

    Fix Version/s: 0.8.2
                   0.9.0
       Resolution: Fixed
         Assignee: Andrzej Bialecki 

This bug has been fixed in recent versions of Hadoop.

> Empty row in/at end of URL-list results in error
> ------------------------------------------------
>
>                 Key: NUTCH-274
>                 URL: http://issues.apache.org/jira/browse/NUTCH-274
>             Project: Nutch
>          Issue Type: Bug
>    Affects Versions: 0.8
>         Environment: nightly-2006-05-20
>            Reporter: Stefan Neufeind
>         Assigned To: Andrzej Bialecki 
>            Priority: Minor
>             Fix For: 0.8.2, 0.9.0
>
>         Attachments: ignoreEmpthyLineDuringInjectV1.patch
>
>
> This is minor - but it's a little unclean :-)
> Reproduce: Have a URL-file with one URL followed by a newline, thus producing 
> an empty line.
> Outcome: Fetcher-threads try to fetch two URLs at the same time. First one is 
> fine - but second is empty and therefor fails proper protocol-detection.
> 60521 022639   Nutch Analysis (org.apache.nutch.analysis.NutchAnalyzer)
> 060521 022639   Nutch Query Filter (org.apache.nutch.searcher.QueryFilter)
> 060521 022639 found resource parse-plugins.xml at 
> file:/home/mm/nutch-nightly/conf/parse-plugins.xml
> 060521 022639 Using URL normalizer: org.apache.nutch.net.BasicUrlNormalizer
> 060521 022639 fetching http://www.bild.de/
> 060521 022639 fetching 
> 060521 022639 fetch of  failed with: 
> org.apache.nutch.protocol.ProtocolNotFound: java.net.MalformedURLException: 
> no protocol: 
> 060521 022639 http.proxy.host = null
> 060521 022639 http.proxy.port = 8080
> 060521 022639 http.timeout = 10000
> 060521 022639 http.content.limit = 65536
> 060521 022639 http.agent = NutchCVS/0.8-dev (Nutch; 
> http://lucene.apache.org/nutch/bot.html; nutch-agent@lucene.apache.org)
> 060521 022639 fetcher.server.delay = 1000
> 060521 022639 http.max.delays = 1000
> 060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.text.TextParser 
> mapped to contentType text/xml via parse-plugins.xml, but
>  its plugin.xml file does not claim to support contentType: text/xml
> 060521 022640 ParserFactory:Plugin: org.apache.nutch.parse.html.HtmlParser 
> mapped to contentType text/xml via parse-plugins.xml, but
>  its plugin.xml file does not claim to support contentType: text/xml
> 060521 022640 ParserFactory: Plugin: org.apache.nutch.parse.rss.RSSParser 
> mapped to contentType text/xml via parse-plugins.xml, but 
> not enabled via plugin.includes in nutch-default.xml
> 060521 022640 Using Signature impl: org.apache.nutch.crawl.MD5Signature
> 060521 022640  map 0%  reduce 0%
> 060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s, 
> 060521 022640 1 pages, 1 errors, 1.0 pages/s, 40 kb/s, 

-- 
This message is automatically generated by JIRA.
-
If you think it was sent incorrectly contact one of the administrators: 
http://issues.apache.org/jira/secure/Administrators.jspa
-
For more information on JIRA, see: http://www.atlassian.com/software/jira

        

-------------------------------------------------------------------------
Take Surveys. Earn Cash. Influence the Future of IT
Join SourceForge.net's Techsay panel and you'll get the chance to share your
opinions on IT & business topics through brief surveys - and earn cash
http://www.techsay.com/default.php?page=join.php&p=sourceforge&CID=DEVDEV
_______________________________________________
Nutch-developers mailing list
Nutch-developers@lists.sourceforge.net
https://lists.sourceforge.net/lists/listinfo/nutch-developers

Reply via email to