On 10/19/05, Chris Mattmann <[EMAIL PROTECTED]> wrote:
>  Actually it's not out of priority, unfortunately because of the generic
> nature of the mime type "text/xml". Turns out that a lot of RSS comes back
> as configured by the web server with the content type "text/xml", even
> though it's recommended that "application/rss+xml" be used as the mime type
> for RSS. Most web server admins don't really spend the time configuring this
> mime type correctly in their web server. Further, if you go look at the IANA
> list of mime types, there really isn't a mime type specified for RSS
> (although RDF has applicaction/rdf+xml, which is sometimes used to identify
> RSS as well).

Hi,
I just realized: we don't have to look inside the XML file. We can
pick it up from context.

1. We could look inside the <head/> for links like:

<link rel="alternate" type="application/rss+xml" title="RSS 2.0"
href="http://migs.paraz.com/w/feed/"; />
<link rel="alternate" type="application/atom+xml" title="Atom 0.3"
href="http://migs.paraz.com/w/feed/atom/"; />

Is it practical to add a parser type to the Outlink type, so that the
HTML parser could set it from context?

2. We could add a new inject type: inject a list of feed URLs as the
starting point for the crawl. Technically, this isn't necessary since
an external program that parse the feeds then generate the URLs.

Reply via email to