Hi all

I have problems parsing youtube rss.

This is the url:
http://youtube.com/rss/global/top_viewed_today.rss

It seems has problems with the line:
<rss version="2.0" xmlns:media="http://search.yahoo.com/mrss";>

In the log file I see:

2006-09-18 09:00:04,163 INFO  fetcher.Fetcher - fetching 
http://youtube.com/rss/global/top_viewed_today.rss
2006-09-18 09:00:17,265 ERROR parse.OutlinkExtractor - getOutlinks
java.net.MalformedURLException: unknown protocol: xmlns
    at java.net.URL.<init>(URL.java:574)
    at java.net.URL.<init>(URL.java:464)
    at java.net.URL.<init>(URL.java:413)
    at 
org.apache.nutch.net.BasicUrlNormalizer.normalize(BasicUrlNormalizer.java:78)
    at org.apache.nutch.parse.Outlink.<init>(Outlink.java:35)
    at 
org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:111)
    at 
org.apache.nutch.parse.OutlinkExtractor.getOutlinks(OutlinkExtractor.java:70)
    at org.apache.nutch.parse.text.TextParser.getParse(TextParser.java:47)
    at org.apache.nutch.parse.ParseUtil.parse(ParseUtil.java:82)
    at 
org.apache.nutch.fetcher.Fetcher$FetcherThread.output(Fetcher.java:276)
    at org.apache.nutch.fetcher.Fetcher$FetcherThread.run(Fetcher.java:152)

Some body know how is wrong, or if it is a bug?

Thanks,
Ernesto





        
        
                
__________________________________________________
Preguntá. Respondé. Descubrí.
Todo lo que querías saber, y lo que ni imaginabas,
está en Yahoo! Respuestas (Beta).
¡Probalo ya! 
http://www.yahoo.com.ar/respuestas


-------------------------------------------------------------------------
Using Tomcat but need to do more? Need to support web services, security?
Get stuff done quickly with pre-integrated technology to make your job easier
Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo
http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642
_______________________________________________
Nutch-general mailing list
[email protected]
https://lists.sourceforge.net/lists/listinfo/nutch-general

Reply via email to