Sourajit,

BTW have you tried using the Tika plugin for parsing RSS feeds? The main
difference IIRC is that it will treat the links to the news items as normal
outlinks and will then fetch them whereas the feed parser generates N-sub
documents directly from the feed. This is actually why we don't have it in
Nutch 2.x as the parsing does not allow to have N docs from a single entry.

Julien

On 9 November 2012 11:34, Sourajit Basak <[email protected]> wrote:

> https://issues.apache.org/jira/browse/NUTCH-1494
>
> On Fri, Nov 9, 2012 at 4:56 PM, Lewis John Mcgibbney <
> [email protected]> wrote:
>
> > Hi,
> >
> > Can you please open an issue for this. I can confirm that without
> > adding some additional dependencies I get the following when
> > attempting to parse an rss feed [0] which I have saved locally.
> >
> > lewis@lewis-desktop:~/ASF/trunk/runtime/local$ ./bin/nutch plugin feed
> > org.apache.nutch.parse.feed.FeedParser latest.xmlException in thread
> > "main" java.lang.reflect.InvocationTargetException
> >         at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
> >         at
> >
> sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
> >         at
> >
> sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
> >         at java.lang.reflect.Method.invoke(Method.java:597)
> >         at
> > org.apache.nutch.plugin.PluginRepository.main(PluginRepository.java:421)
> > Caused by: java.lang.NoClassDefFoundError:
> > com/sun/syndication/io/SyndFeedInput
> >         at
> > org.apache.nutch.parse.feed.FeedParser.getParse(FeedParser.java:117)
> >         at
> org.apache.nutch.parse.feed.FeedParser.main(FeedParser.java:211)
> >         ... 5 more
> > Caused by: java.lang.ClassNotFoundException:
> > com.sun.syndication.io.SyndFeedInput
> >         at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
> >         at java.security.AccessController.doPrivileged(Native Method)
> >         at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:307)
> >         at java.lang.ClassLoader.loadClass(ClassLoader.java:248)
> >         ... 7 more
> >
> >
> > [0] http://www.scotland.gov.uk/rss/publications/latest.xml
> >
> >
> >
> > On Fri, Nov 9, 2012 at 10:55 AM, Sourajit Basak
> > <[email protected]> wrote:
> >
>



-- 
*
*Open Source Solutions for Text Engineering

http://digitalpebble.blogspot.com/
http://www.digitalpebble.com
http://twitter.com/digitalpebble

Reply via email to