Sourajit, BTW have you tried using the Tika plugin for parsing RSS feeds? The main difference IIRC is that it will treat the links to the news items as normal outlinks and will then fetch them whereas the feed parser generates N-sub documents directly from the feed. This is actually why we don't have it in Nutch 2.x as the parsing does not allow to have N docs from a single entry.
Julien On 9 November 2012 11:34, Sourajit Basak <[email protected]> wrote: > https://issues.apache.org/jira/browse/NUTCH-1494 > > On Fri, Nov 9, 2012 at 4:56 PM, Lewis John Mcgibbney < > [email protected]> wrote: > > > Hi, > > > > Can you please open an issue for this. I can confirm that without > > adding some additional dependencies I get the following when > > attempting to parse an rss feed [0] which I have saved locally. > > > > lewis@lewis-desktop:~/ASF/trunk/runtime/local$ ./bin/nutch plugin feed > > org.apache.nutch.parse.feed.FeedParser latest.xmlException in thread > > "main" java.lang.reflect.InvocationTargetException > > at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) > > at > > > sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39) > > at > > > sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25) > > at java.lang.reflect.Method.invoke(Method.java:597) > > at > > org.apache.nutch.plugin.PluginRepository.main(PluginRepository.java:421) > > Caused by: java.lang.NoClassDefFoundError: > > com/sun/syndication/io/SyndFeedInput > > at > > org.apache.nutch.parse.feed.FeedParser.getParse(FeedParser.java:117) > > at > org.apache.nutch.parse.feed.FeedParser.main(FeedParser.java:211) > > ... 5 more > > Caused by: java.lang.ClassNotFoundException: > > com.sun.syndication.io.SyndFeedInput > > at java.net.URLClassLoader$1.run(URLClassLoader.java:202) > > at java.security.AccessController.doPrivileged(Native Method) > > at java.net.URLClassLoader.findClass(URLClassLoader.java:190) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:307) > > at java.lang.ClassLoader.loadClass(ClassLoader.java:248) > > ... 7 more > > > > > > [0] http://www.scotland.gov.uk/rss/publications/latest.xml > > > > > > > > On Fri, Nov 9, 2012 at 10:55 AM, Sourajit Basak > > <[email protected]> wrote: > > > -- * *Open Source Solutions for Text Engineering http://digitalpebble.blogspot.com/ http://www.digitalpebble.com http://twitter.com/digitalpebble

