Re: @lib/scrape.l questions

2015-06-26 Thread Alexander Burger
Hi Luis,

> In the above example, I used `in` and the call to `parseLink` works fine.

OK

> But
> (client "picolisp.com" 80 "wiki/?home"
>(while (from "   (parseLink (till "<"
> 
> does not.  AFAICT, each while cycle returns a list which in my view
> would be used as an argument to `parseLink`, but it doesn't work as I
> expect.  `in` opens an input channel, `client` seems to me to return a
> list.

Hmm, as I said, it depends on what 'parseLink' does. However the lists
of characters (i.e. the chopped URLs) should be passed.

I tried this:

   (client "picolisp.com" 80 "wiki/?home"
  (make
 (while (from " I tried to save the whole html file with a `out`, before the `client`
> ...
> (out "hh.html"
>(client "picolisp.com" 80 "wiki/?home"))

This is not a good idea, because 'client' opens its own 'out' back to
the server. You need it inside of 'client', e.g.

   (client "picolisp.com" 80 "wiki/?home"
  (out NIL (echo)) )

BTW, this is the most basic call to see what 'client' does.


> Also tried to append to a file with each iteration:
> (client "picolisp.com" 80 "wiki/?home"
>(while (from "   (out "+hh.html"
>  (msg (till "<")

Hmm, this is also not a good idea. 'msg' prints to stderr, so nothing
gets printed into "hh.html".

Besides this, the (out "+xxx) is OK, however quite some overhead because
it is re-opened and closed all the time. Better would be

   (client ..
  (out "hh.html"
 (while (from "...
(println (till "<")) ..

This will print the lists to the file.


> Frustation is high right now... :-(

Sorry to hear that! But I think we are getting closer ;)
♪♫ Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: @lib/scrape.l questions

2015-06-25 Thread Luis P. Mendes
Hi Alex,


2015-06-24 12:49 GMT+01:00 Alexander Burger :
> Hi Luis,
>
>> These are beginner questions, but why doesn't this work?
>>
>> (in "i.html"
>>(while (from ">   (parseLink (till "<" T))
>>   ))
>
> This depends on what 'parseLink' does.
>
> (till "<" T) returns a transient symbol, you can try it as
>
>(in "i.html" (from "-> "http://...";
>
> So perhaps 'parseLink' wants a list of characters, to be able to parse
> it? Then you might better do:
>
>(in "i.html" (from "-> ("h" "t" "t" "p" ":" "/" ...)
>
> i.e. without the 'T' argument to 'till'.
>
>
>> It's just a substitution of `msg` by a call to `parseLink` that
>> doesn't seem to get called.
>
> 'parseLink' will get called, if at least one pattern " the file.

In the above example, I used `in` and the call to `parseLink` works fine.
But
(client "picolisp.com" 80 "wiki/?home"
   (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: @lib/scrape.l questions

2015-06-24 Thread Alexander Burger
Hi Luis,

> These are beginner questions, but why doesn't this work?
> 
> (in "i.html"
>(while (from "   (parseLink (till "<" T))
>   ))

This depends on what 'parseLink' does.

(till "<" T) returns a transient symbol, you can try it as

   (in "i.html" (from " "http://...";

So perhaps 'parseLink' wants a list of characters, to be able to parse
it? Then you might better do:

   (in "i.html" (from " ("h" "t" "t" "p" ":" "/" ...)

i.e. without the 'T' argument to 'till'.


> It's just a substitution of `msg` by a call to `parseLink` that
> doesn't seem to get called.

'parseLink' will get called, if at least one pattern " Just for curiosity, in my previous message, in
> (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link...
> 
> (car L) return the same as (caar L), (caaar L), etc?

Yes. The global '*Links' holds an association list: A list of cons
pairs, each with a string (transient symbol) in the CAR, and an URL in
the CDR.

So the CAR of each element is a transient symbol. And the VAL or CAR of
a transient symbol is by default the symbol again. So you always keep
getting the same symbol.

♪♫ Alex
-- 
UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: @lib/scrape.l questions

2015-06-24 Thread Luis P. Mendes
2015-06-23 17:44 GMT+01:00 Alexander Burger :
> Hi Luis,
>
>> I want to go through every link to perform further parsing in some
>> links, based in the first word of the tile of the link, but ran into
>> some difficulties.
>>
>> Is this the right way to access each of the links?
>> : (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link...
>> ...
>> I know that scrape.l is available, but still a ton too much for me to
>> understand it.
>
> I would think that "@lib/scrape.l" is not so much for parsing general
> websites. It is tailored for communication with - and controlling of -
> interactive PicoLisp GUI applications, and thus rather overkill.
>
> You can directly access the contents of a site with 'client', 'from' and
> 'till'.
>
> For example, this prints every link ('href' anchor):
>
>(client "picolisp.com" 80 "wiki/?home"
>   (while (from "  (msg (till "<" T)) ) )
>
> Instead of the final 'msg', you could to further processing of the data,
> and/or omit the 'T' in the 'till' call to get lists of characters
> instead of strings.

These are beginner questions, but why doesn't this work?

(in "i.html"
   (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe


Re: @lib/scrape.l questions

2015-06-23 Thread Alexander Burger
Hi Luis,

> I want to go through every link to perform further parsing in some
> links, based in the first word of the tile of the link, but ran into
> some difficulties.
> 
> Is this the right way to access each of the links?
> : (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link...
> ...
> I know that scrape.l is available, but still a ton too much for me to
> understand it.

I would think that "@lib/scrape.l" is not so much for parsing general
websites. It is tailored for communication with - and controlling of -
interactive PicoLisp GUI applications, and thus rather overkill.

You can directly access the contents of a site with 'client', 'from' and
'till'.

For example, this prints every link ('href' anchor):

   (client "picolisp.com" 80 "wiki/?home"
  (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe