2009/2/3 Alexander Aristov <alexander.aris...@gmail.com>: > So can I ask you a few question > > 1. Can I disable plugin-rss and leave only feed active?
This is a good idea because parse-rss takes priority over feed. So if both are active, parse-rss will work, not feed. > 2. Do I need to have any other plugins enabled to have only RSS feeds parsed > and indexed? (html, text) > You do not need to but it is advised to leave parse-html active. This way, if feed entry contains html, parse-html can parse html content. > can I user the crawl command (used for intranet crawling) to perform > crawling. > I don't know :) I never used it. But I think you can. > thanks > > Alexander > > > 2009/2/3 Doğacan Güney <doga...@gmail.com> > >> On Tue, Feb 3, 2009 at 10:30 AM, Alexander Aristov >> <alexander.aris...@gmail.com> wrote: >> > People >> > >> > Question about rss feed parsers. >> > >> > I am trying to configure Nutch to crawl rss feeds. I have enabled the >> feed >> > and parse-rss plugins. I found out that these are two separate plugins >> and >> > that parse-rss is older. Thats ok. >> > >> > I expect that these parsers would produce me separate documents for each >> > item in a feed but instead I get only rss header parsed and stored in the >> > index. Items are not included in the lucene indexes. >> > >> > How can point me on necessary configuration params I should change to >> have >> > RSSs indexed. >> > >> >> Plugin feed should work like that on nutch trunk (separate document for >> separate >> entry) >> >> > -- >> > Best Regards >> > Alexander Aristov >> > >> >> >> >> -- >> Doğacan Güney >> > > > > -- > Best Regards > Alexander Aristov > -- Doğacan Güney