Hi Mark,

> Thanks for all the hard work.  It is much appreciated.  

Thanks! :)


> I have a simple question about webscraping.  I hope I am posting it to
> the right place.

Yes


> (client "dancecentral.co.uk" 80 "index.html"
>    (when (from "<title>")
>      (pack (trim (till "</title>"))) ))
> 
> it returns
> 
> >Th

The problem is that the functions 'from' and 'till' are not symmetric,
as one might easily expect:

- 'from' accepts _string_ argument(s), and cause the input stream to
  continue behind the first matching one.

- 'till', on the other hand, searches for the first matching _character_
  in the argument. The reason is to avoid ambiguities and the need to
  rewind the input stream if read too far.

Thus, the right way to do it is to pass all possible characters that
might delimit the desired result, retrieve that value with 'till', and
operate on the result with other Lisp functions (e.g. 'match') if
necessary.


> These are the first two characters of the title so its been partially
> successful, and I can reproduce similar results for other urls.

: (client "dancecentral.co.uk" 80 "index.html"
   (when (from "<title>")
      (pack (trim (till "<"))) ) )
-> "The hub of dance events"

It occurs because the 'e' after "Th" is a character in the "</title>"
argument above.

Cheers,
- Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe

Reply via email to