Re: 64bit segmentation fault when matching on long lists
I started with this approach yesterday, first in order to capture feed type which I am now able to do. I noticed that some rss feeds have attributes in their item tags, therefore the above won't work 100% of the time. (in rss.xml (while (from item) (println (make (loop (NIL (chain (till ))) (char) (T (tail '`(chop item) @)) ) ) ) )) This will accurately capture the item tag all the time I think but then we need some way of discarding the attributes and the closing . I tried with an immediate (till ) after the (from) but it didn't have the intentional result, any suggestions here? /Henrik On Sun, Nov 1, 2009 at 6:26 PM, Alexander Burger a...@software-lab.de wrot= e: On Sun, Nov 01, 2009 at 01:49:59PM +0100, Henrik Sarvell wrote: It's a good question with a very simple answer, many many feeds out there are completely broken, sometimes they don't conform to standards, that's a good scenario but often they have unmatched tags or unclosed attributes. Ouch. I see. So what do you think about the following: (while (from item) =A0 (println =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 # Instead of printing =A0 =A0 =A0(make =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 = =A0 =A0 =A0 =A0 =A0 # do further matching =A0 =A0 =A0 =A0 (loop =A0 =A0 =A0 =A0 =A0 =A0(NIL (chain (till ))) =A0 =A0 =A0 =A0 =A0 =A0 = =A0# Collect until next tag =A0 =A0 =A0 =A0 =A0 =A0(char) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0= =A0 =A0 =A0 =A0 =A0# Skip '' =A0 =A0 =A0 =A0 =A0 =A0(T (tail '`(chop item) @)) ) ) ) ) =A0# See if w= e got item The 'make' will give you smaller chunks of data, which are easier to 'match'. Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe
Re: 64bit segmentation fault when matching on long lists
It's a good question with a very simple answer, many many feeds out there are completely broken, sometimes they don't conform to standards, that's a good scenario but often they have unmatched tags or unclosed attributes. At first I tried using the xml function but I quickly discovered that it breaks down when trying to read roughly 20% of the feeds out there, a deplorable situation but it's the way it is. About the file I sent you lacking items sorry, then it must be an ATOM feed, not RSS, then you try and find entry.../entry instead but be careful because that format will allow for attributes in the tag, ie entry attr=attr.../entry. I have attached my current rss.l which is able to parse all of the 800+ feeds I subscribe to, note that I use (xml) for the OPML format, these are files containing my subscriptions which a feedreader should be able to import/export, my reader can currently import them. The reason I'm able to use (xml) on that one is that the two readers my reader currently can import from are Google reader and the desktop app called simply FeedReader, at least these two manage to export valid xml files. /Henrik On Sun, Nov 1, 2009 at 1:25 PM, Alexander Burger a...@software-lab.de wrote: Hi Henrik, The problem is using from in combination with till repeatedly to parse input in order to for instance get at the contents of the item/tem elements, there is a twist though, the contents can contain more markup so a check is needed every time till encounters for instance , if that one is to be used as a stop char. This is indeed a bit tedious, because we would need to manually collect strings and match them until the proper patterns are found. But before we start doing that: I'm wondering why this should be necessary. Can't we just just use the 'xml' function? It was written for that purpose after all (though it is also based on 'from' and 'till'): (load lib/xml.l) (setq Lst (in rss.xml (and (xml?) (xml Now 'Lst' contains the whole XML tree, which can be handled easily with Lisp functions. For example, to collect all item expressions nested somewhere in that list, you could use 'fish' (fish '((L) (== 'item (car L))) Lst) Actually, the sample rss.xml you've attached does not seem to contain any 'item' tags. But if I try 'author' (fish '((L) (== 'author (car L))) Lst) I get a long result list. To inspect it conveniently, I usually do (more (fish '((L) (== 'author (car L))) Lst) pretty) Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe rss.l Description: Binary data
Re: 64bit segmentation fault when matching on long lists
What I'm subsequently doing further down in the code is a recursive matching in order to get at the pieces I want, on the same list, I'll give you an example: (dm twitter (L) (use (@A @X @E @Z) (make (while (match '(@A s t a t u s @E / s t a t u s @Z) L) (let R (twitterEntry This @E) (when R (link R))) (setq L @Z) L is here the very same list called Lst in the earlier example that segfaults. So even if I should use an argument in (till) in the first type test the above algorithm will segfault anyway as it is matching on the same long list. A trick you yourself showed me once. But yeah if there are quicker ways of getting at the data... /Henrik On Fri, Oct 30, 2009 at 4:51 PM, Alexander Burger a...@software-lab.de wro= te: Hi Henrik, segfault. Hope this helps. Thanks. Unfortunately, I cannot reproduce it. I even checked with my special GC check setup, where a garbage collection is performed before each 'cons'. This usually shows errors in the data handling (though such a test runs for hours). Could it be a stack overflow, because of that long argument? What says =A0 $ ulimit -s in your case? Does it work if you set it to 'unlimited'? If so, I would reconsider the design of that function. 'match'ing such large lists is not very efficient. Can't you step through the data with 'from' and 'till', extracting individual tokens, instead of reading them into a huge list? Cheers, - Alex -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe -- UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe