On 10/19/05, Chris Mattmann <[EMAIL PROTECTED]> wrote: > Actually it's not out of priority, unfortunately because of the generic > nature of the mime type "text/xml". Turns out that a lot of RSS comes back > as configured by the web server with the content type "text/xml", even > though it's recommended that "application/rss+xml" be used as the mime type > for RSS. Most web server admins don't really spend the time configuring this > mime type correctly in their web server. Further, if you go look at the IANA > list of mime types, there really isn't a mime type specified for RSS > (although RDF has applicaction/rdf+xml, which is sometimes used to identify > RSS as well).
Hi, I just realized: we don't have to look inside the XML file. We can pick it up from context. 1. We could look inside the <head/> for links like: <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://migs.paraz.com/w/feed/" /> <link rel="alternate" type="application/atom+xml" title="Atom 0.3" href="http://migs.paraz.com/w/feed/atom/" /> Is it practical to add a parser type to the Outlink type, so that the HTML parser could set it from context? 2. We could add a new inject type: inject a list of feed URLs as the starting point for the crawl. Technically, this isn't necessary since an external program that parse the feeds then generate the URLs.
