Hi Chris

How do I change the plugin.xml? For example, if I want to crawl rss files
end with "xml", just add a new element?

      <implementation id="org.apache.nutch.parse.rss.RSSParser"
                      class="org.apache.nutch.parse.rss.RSSParser"
                      contentType="application/rss+xml"
                      pathSuffix="rss"/>
      <implementation id="org.apache.nutch.parse.rss.RSSParser"
                      class="org.apache.nutch.parse.rss.RSSParser"
                      contentType="application/rss+xml"
                      pathSuffix="xml"/>

Am I right?



在06-2-3,Chris Mattmann <[EMAIL PROTECTED]> 写道:
>
> Hi there,
> Sure it will, you just have to configure it to do that. Pop over to
> $NUTCH_HOME/src/plugin/parse-rss/ and open up plugin.xml. In there there
> is
> an attribute called "pathSuffix". Change that to handle whatever type of
> rss
> file you want to crawl. That will work locally. For web-based crawls, you
> need to make sure that the content type being returned for your RSS
> content
> matches the content type specified in the plugin.xml file that parse-rss
> claims to support.
>
> Note that you might not have * a lot * of success with being able to
> control the content type for rss files returned by web servers. I've seen
> a
> LOT of inconsistency among the way that they're configured by the
> administrators, etc. However, just to let you know, there are some people
> in
> the group that are working on a solution to addressing this.
>
> Hope that helps.
>
> Cheers,
> Chris
>
>
>
> On 2/3/06 7:16 AM, "盖世豪侠" <[EMAIL PROTECTED]> wrote:
>
> > Hi *Chris,*
> >
> > The files of RSS 1.0 have a postfix of rdf. So willthe parser recognize
> it
> > automatically as a rss file?
> >
> >
> > 在06-2-3,Chris Mattmann <[EMAIL PROTECTED]> 写道:
> >>
> >> Hi there,
> >>
> >> parse-rss is based on commons-feedparser
> >> (http://jakarta.apache.org/commons/sandbox/feedparser). From the
> >> feedparser
> >> website:
> >>
> >> "...commons-feedparser supports all versions of RSS (0.9, 0.91, 0.92,
> 1.0,
> >> and 2.0), Atom 0.5 (and future versions) as well as easy ad hoc
> extension
> >> and RSS 1.0 modules capability..."
> >>
> >> Hope that helps.
> >>
> >> Thanks,
> >> Chris
> >>
> >>
> >> On 2/3/06 6:46 AM, "盖世豪侠" <[EMAIL PROTECTED]> wrote:
> >>
> >>> I see the test file is of version 0.91.
> >>> Does the plugin support higher versions like 1.0 or 2.0?
> >>>
> >>> --
> >>> 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天
> 分>>> 既
> >>> 然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,
> 当>>> 然
> >>> 后悔莫及。
> >>
> >>
> >>
> >
> >
> > --
> > 《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既
> > 然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然
> > 后悔莫及。
>
>
>


--
《盖世豪侠》好评如潮,让无线收视居高不下,无线高兴之余,仍未重用。周星驰岂是池中物,喜剧天分既然崭露,当然不甘心受冷落,于是转投电影界,在大银幕上一展风采。无线既得千里马,又失千里马,当然后悔莫及。

Reply via email to