Hi Ernesto, The reason you are getting that error is because the content-type returned is "text/plain" which calls the text parser and not the rss parser plugin. Just to check that it works, you can put "<plugin id="parse-rss" />" under mimeType for "text/plain" as the first option and try crawling it again.
-Meghna On 9/18/06, Ernesto De Santis <[EMAIL PROTECTED]> wrote: > Hi all > > I have problems parsing youtube rss. > > This is the url: > http://youtube.com/rss/global/top_viewed_today.rss > > It seems has problems with the line: > <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss"> > > In the log file I see: > > 2006-09-18 09:00:04,163 INFO fetcher.Fetcher - fetching > http://youtube.com/rss/global/top_viewed_today.rss > 2006-09-18 09:00:17,265 ERROR parse.OutlinkExtractor - getOutlinks > java.net.MalformedURLException: unknown protocol: xmlns > at java.net.URL.<init>(URL.java:574) > at java.net.URL.<init>(URL.java:464) > at java.net.URL.<init>(URL.java:413) > at > org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78) > at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35) > at > org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111) > at > org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70) > at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47) > at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82) > at > org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276) > at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152) > > Some body know how is wrong, or if it is a bug? > > Thanks, > Ernesto > > > > > > > > > __________________________________________________ > Preguntá. Respondé. Descubrí. > Todo lo que querías saber, y lo que ni imaginabas, > está en Yahoo! Respuestas (Beta). > ¡Probalo ya! > http://www.yahoo.com.ar/respuestas > > ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
