Re: 64bit segmentation fault when matching on long lists

2009-11-03 Thread Henrik Sarvell
I started with this approach yesterday, first in order to capture feed
type which I am now able to do.

I noticed that some rss feeds have attributes in their item tags,
therefore the above won't work 100% of the time.

(in rss.xml
   (while
  (from item)
  (println
 (make
(loop
   (NIL (chain (till )))
   (char)
   (T (tail '`(chop item) @)) ) ) ) ))

This will accurately capture the item tag all the time I think but
then we need some way of discarding the attributes and the closing .
I tried with an immediate (till ) after the (from) but it didn't
have the intentional result, any suggestions here?

/Henrik


On Sun, Nov 1, 2009 at 6:26 PM, Alexander Burger a...@software-lab.de wrot=
e:
 On Sun, Nov 01, 2009 at 01:49:59PM +0100, Henrik Sarvell wrote:
 It's a good question with a very simple answer, many many feeds out
 there are completely broken, sometimes they don't conform to
 standards, that's a good scenario but often they have unmatched tags
 or unclosed attributes.

 Ouch. I see.

 So what do you think about the following:

 (while (from item)
 =A0 (println =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 # Instead of printing
 =A0 =A0 =A0(make =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =
=A0 =A0 =A0 =A0 =A0 # do further matching
 =A0 =A0 =A0 =A0 (loop
 =A0 =A0 =A0 =A0 =A0 =A0(NIL (chain (till ))) =A0 =A0 =A0 =A0 =A0 =A0 =
=A0# Collect until next tag
 =A0 =A0 =A0 =A0 =A0 =A0(char) =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0 =A0=
 =A0 =A0 =A0 =A0 =A0# Skip ''
 =A0 =A0 =A0 =A0 =A0 =A0(T (tail '`(chop item) @)) ) ) ) ) =A0# See if w=
e got item

 The 'make' will give you smaller chunks of data, which are easier to
 'match'.

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe


Re: 64bit segmentation fault when matching on long lists

2009-11-01 Thread Henrik Sarvell
It's a good question with a very simple answer, many many feeds out
there are completely broken, sometimes they don't conform to
standards, that's a good scenario but often they have unmatched tags
or unclosed attributes.

At first I tried using the xml function but I quickly discovered that
it breaks down when trying to read roughly 20% of the feeds out there,
a deplorable situation but it's the way it is.

About the file I sent you lacking items sorry, then it must be an ATOM
feed, not RSS, then you try and find entry.../entry instead but be
careful because that format will allow for attributes in the tag, ie
entry attr=attr.../entry.

I have attached my current rss.l which is able to parse all of the
800+ feeds I subscribe to, note that I use (xml) for the OPML format,
these are files containing my subscriptions which a feedreader should
be able to import/export, my reader can currently import them. The
reason I'm able to use (xml) on that one is that the two readers my
reader currently can import from are Google reader and the desktop app
called simply FeedReader, at least these two manage to export valid
xml files.

/Henrik

On Sun, Nov 1, 2009 at 1:25 PM, Alexander Burger a...@software-lab.de wrote:
 Hi Henrik,

 The problem is using from in combination with till repeatedly to parse
 input in order to for instance get at the contents of the item/tem
 elements, there is a twist though, the contents can contain more
 markup so a check is needed every time till encounters for instance ,
 if that one is to be used as a stop char.

 This is indeed a bit tedious, because we would need to manually collect
 strings and match them until the proper patterns are found.


 But before we start doing that: I'm wondering why this should be
 necessary. Can't we just just use the 'xml' function? It was written for
 that purpose after all (though it is also based on 'from' and 'till'):

   (load lib/xml.l)
   (setq Lst (in rss.xml (and (xml?) (xml

 Now 'Lst' contains the whole XML tree, which can be handled easily with
 Lisp functions.


 For example, to collect all item expressions nested somewhere in that
 list, you could use 'fish'

   (fish '((L) (== 'item (car L))) Lst)

 Actually, the sample rss.xml you've attached does not seem to contain
 any 'item' tags. But if I try 'author'

   (fish '((L) (== 'author (car L))) Lst)

 I get a long result list.

 To inspect it conveniently, I usually do

   (more (fish '((L) (== 'author (car L))) Lst) pretty)

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe



rss.l
Description: Binary data


Re: 64bit segmentation fault when matching on long lists

2009-10-30 Thread Henrik Sarvell
What I'm subsequently doing further down in the code is a recursive
matching in order to get at the pieces I want, on the same list, I'll
give you an example:

(dm twitter (L)
  (use (@A @X @E @Z)
 (make
(while (match '(@A  s t a t u s  @E  /
s t a t u s  @Z) L)
   (let R (twitterEntry This @E)
  (when R (link R)))
   (setq L @Z)

L is here the very same list called Lst in the earlier example that
segfaults. So even if I should use an argument in (till) in the first
type test the above algorithm will segfault anyway as it is matching
on the same long list.

A trick you yourself showed me once. But yeah if there are quicker
ways of getting at the data...

/Henrik


On Fri, Oct 30, 2009 at 4:51 PM, Alexander Burger a...@software-lab.de wro=
te:
 Hi Henrik,

 segfault. Hope this helps.

 Thanks. Unfortunately, I cannot reproduce it. I even checked with my
 special GC check setup, where a garbage collection is performed before
 each 'cons'. This usually shows errors in the data handling (though such
 a test runs for hours).


 Could it be a stack overflow, because of that long argument? What says

 =A0 $ ulimit -s

 in your case? Does it work if you set it to 'unlimited'?

 If so, I would reconsider the design of that function. 'match'ing such
 large lists is not very efficient. Can't you step through the data with
 'from' and 'till', extracting individual tokens, instead of reading them
 into a huge list?

 Cheers,
 - Alex
 --
 UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=3dunsubscribe

-- 
UNSUBSCRIBE: mailto:picol...@software-lab.de?subject=unsubscribe