Hi Ernesto,

The reason you are getting that error is because the content-type
returned is "text/plain" which calls the text parser and not the rss
parser plugin. Just to check that it works, you can put "<plugin
id="parse-rss" />" under mimeType for "text/plain" as the first option
and try crawling it again.

-Meghna

On 9/18/06, Ernesto De Santis <[EMAIL PROTECTED]> wrote:
> Hi all
>
> I have problems parsing youtube rss.
>
> This is the url:
> http://youtube.com/rss/global/top_viewed_today.rss
>
> It seems has problems with the line:
> <rss version="2.0" xmlns:media="http://search.yahoo.com/mrss";>
>
> In the log file I see:
>
> 2006-09-18 09:00:04,163 INFO  fetcher.Fetcher - fetching
> http://youtube.com/rss/global/top_viewed_today.rss
> 2006-09-18 09:00:17,265 ERROR parse.OutlinkExtractor - getOutlinks
> java.net.MalformedURLException: unknown protocol: xmlns
>     at java.net.URL.<init>(URL.java:574)
>     at java.net.URL.<init>(URL.java:464)
>     at java.net.URL.<init>(URL.java:413)
>     at
> org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78)
>     at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35)
>     at
> org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111)
>     at
> org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70)
>     at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47)
>     at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
>     at
> org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276)
>     at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152)
>
> Some body know how is wrong, or if it is a bug?
>
> Thanks,
> Ernesto
>
>
>
>
>
>
>
>
> __________________________________________________
> Preguntá. Respondé. Descubrí.
> Todo lo que querías saber, y lo que ni imaginabas,
> está en Yahoo! Respuestas (Beta).
> ¡Probalo ya!
> http://www.yahoo.com.ar/respuestas
>
>

-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to