Hi Jeremy, On 8/28/06 10:18 AM, "HUYLEBROECK Jeremy RD-ILAB-SSF" <[EMAIL PROTECTED]> wrote:
> > The Nutch Feed/RSS plugin (parse-rss) only allows you to search the > entire channel/feed text, not items individually. Actually, this isn't entirely the case. parse-rss actually indexes the item text (see line 148 in RSSParser.java) as well. Additionally, parse-rss adds the individual item links to the Outlinks (see lines 161 and 163 in RSSParser.java) , and they get crawled as well, in addition to the channel text (see line 123 in RSSParser.java) and channel outlink (see lines 130 and 132 in RSSParser.java). > You'll have to develop your own if it's what you are trying to do. > I also found that the feedparse library used by parse-rss doesn't read > properly all formats and I myself moved to the ROME library for now. I haven't really noticed any formats not really handled by commons-feedparser. What formats have you noticed that it doesn't handle? Cheers, Chris > > > -----Original Message----- > From: Dima Gritsenko [mailto:[EMAIL PROTECTED] > Sent: Monday, August 28, 2006 10:44 AM > To: [email protected] > Subject: RSS search by nutch > > Hi, > > Does nutch have a class for searching incoming RSS feeds in real time? > Thank you. > Dima. ------------------------------------------------------------------------- Using Tomcat but need to do more? Need to support web services, security? Get stuff done quickly with pre-integrated technology to make your job easier Download IBM WebSphere Application Server v.1.0.1 based on Apache Geronimo http://sel.as-us.falkag.net/sel?cmd=lnk&kid=120709&bid=263057&dat=121642 _______________________________________________ Nutch-general mailing list [email protected] https://lists.sourceforge.net/lists/listinfo/nutch-general
