It's a good question with a very simple answer, many many feeds out
there are completely broken, sometimes they don't conform to
standards, that's a good scenario but often they have unmatched tags
or unclosed attributes.

At first I tried using the xml function but I quickly discovered that
it breaks down when trying to read roughly 20% of the feeds out there,
a deplorable situation but it's the way it is.

About the file I sent you lacking items sorry, then it must be an ATOM
feed, not RSS, then you try and find <entry>...</entry> instead but be
careful because that format will allow for attributes in the tag, ie
<entry attr="attr">...</entry>.

I have attached my current rss.l which is able to parse all of the
800+ feeds I subscribe to, note that I use (xml) for the OPML format,
these are files containing my subscriptions which a feedreader should
be able to import/export, my reader can currently import them. The
reason I'm able to use (xml) on that one is that the two readers my
reader currently can import from are Google reader and the desktop app
called simply FeedReader, at least these two manage to export valid
xml files.

/Henrik

On Sun, Nov 1, 2009 at 1:25 PM, Alexander Burger <a...@software-lab.de> wrote:
> Hi Henrik,
>
>> The problem is using from in combination with till repeatedly to parse
>> input in order to for instance get at the contents of the <item></tem>
>> elements, there is a twist though, the contents can contain more
>> markup so a check is needed every time till encounters for instance <,
>> if that one is to be used as a stop char.
>
> This is indeed a bit tedious, because we would need to manually collect
> strings and match them until the proper patterns are found.
>
>
> But before we start doing that: I'm wondering why this should be
> necessary. Can't we just just use the 'xml' function? It was written for
> that purpose after all (though it is also based on 'from' and 'till'):
>
>   (load "lib/xml.l")
>   (setq Lst (in "rss.xml" (and (xml?) (xml))))
>
> Now 'Lst' contains the whole XML tree, which can be handled easily with
> Lisp functions.
>
>
> For example, to collect all <item> expressions nested somewhere in that
> list, you could use 'fish'
>
>   (fish '((L) (== 'item (car L))) Lst)
>
> Actually, the sample "rss.xml" you've attached does not seem to contain
> any 'item' tags. But if I try 'author'
>
>   (fish '((L) (== 'author (car L))) Lst)
>
> I get a long result list.
>
> To inspect it conveniently, I usually do
>
>   (more (fish '((L) (== 'author (car L))) Lst) pretty)
>
> Cheers,
> - Alex
> --
> UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
>

Attachment: rss.l
Description: Binary data

Reply via email to