Re: @lib/scrape.l questions
Hi Luis, > In the above example, I used `in` and the call to `parseLink` works fine. OK > But > (client "picolisp.com" 80 "wiki/?home" >(while (from " (parseLink (till "<" > > does not. AFAICT, each while cycle returns a list which in my view > would be used as an argument to `parseLink`, but it doesn't work as I > expect. `in` opens an input channel, `client` seems to me to return a > list. Hmm, as I said, it depends on what 'parseLink' does. However the lists of characters (i.e. the chopped URLs) should be passed. I tried this: (client "picolisp.com" 80 "wiki/?home" (make (while (from " I tried to save the whole html file with a `out`, before the `client` > ... > (out "hh.html" >(client "picolisp.com" 80 "wiki/?home")) This is not a good idea, because 'client' opens its own 'out' back to the server. You need it inside of 'client', e.g. (client "picolisp.com" 80 "wiki/?home" (out NIL (echo)) ) BTW, this is the most basic call to see what 'client' does. > Also tried to append to a file with each iteration: > (client "picolisp.com" 80 "wiki/?home" >(while (from " (out "+hh.html" > (msg (till "<") Hmm, this is also not a good idea. 'msg' prints to stderr, so nothing gets printed into "hh.html". Besides this, the (out "+xxx) is OK, however quite some overhead because it is re-opened and closed all the time. Better would be (client .. (out "hh.html" (while (from "... (println (till "<")) .. This will print the lists to the file. > Frustation is high right now... :-( Sorry to hear that! But I think we are getting closer ;) ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: @lib/scrape.l questions
Hi Alex, 2015-06-24 12:49 GMT+01:00 Alexander Burger : > Hi Luis, > >> These are beginner questions, but why doesn't this work? >> >> (in "i.html" >>(while (from "> (parseLink (till "<" T)) >> )) > > This depends on what 'parseLink' does. > > (till "<" T) returns a transient symbol, you can try it as > >(in "i.html" (from "-> "http://..."; > > So perhaps 'parseLink' wants a list of characters, to be able to parse > it? Then you might better do: > >(in "i.html" (from "-> ("h" "t" "t" "p" ":" "/" ...) > > i.e. without the 'T' argument to 'till'. > > >> It's just a substitution of `msg` by a call to `parseLink` that >> doesn't seem to get called. > > 'parseLink' will get called, if at least one pattern " the file. In the above example, I used `in` and the call to `parseLink` works fine. But (client "picolisp.com" 80 "wiki/?home" (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: @lib/scrape.l questions
Hi Luis, > These are beginner questions, but why doesn't this work? > > (in "i.html" >(while (from " (parseLink (till "<" T)) > )) This depends on what 'parseLink' does. (till "<" T) returns a transient symbol, you can try it as (in "i.html" (from " "http://..."; So perhaps 'parseLink' wants a list of characters, to be able to parse it? Then you might better do: (in "i.html" (from " ("h" "t" "t" "p" ":" "/" ...) i.e. without the 'T' argument to 'till'. > It's just a substitution of `msg` by a call to `parseLink` that > doesn't seem to get called. 'parseLink' will get called, if at least one pattern " Just for curiosity, in my previous message, in > (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link... > > (car L) return the same as (caar L), (caaar L), etc? Yes. The global '*Links' holds an association list: A list of cons pairs, each with a string (transient symbol) in the CAR, and an URL in the CDR. So the CAR of each element is a transient symbol. And the VAL or CAR of a transient symbol is by default the symbol again. So you always keep getting the same symbol. ♪♫ Alex -- UNSUBSCRIBE: mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: @lib/scrape.l questions
2015-06-23 17:44 GMT+01:00 Alexander Burger : > Hi Luis, > >> I want to go through every link to perform further parsing in some >> links, based in the first word of the tile of the link, but ran into >> some difficulties. >> >> Is this the right way to access each of the links? >> : (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link... >> ... >> I know that scrape.l is available, but still a ton too much for me to >> understand it. > > I would think that "@lib/scrape.l" is not so much for parsing general > websites. It is tailored for communication with - and controlling of - > interactive PicoLisp GUI applications, and thus rather overkill. > > You can directly access the contents of a site with 'client', 'from' and > 'till'. > > For example, this prints every link ('href' anchor): > >(client "picolisp.com" 80 "wiki/?home" > (while (from " (msg (till "<" T)) ) ) > > Instead of the final 'msg', you could to further processing of the data, > and/or omit the 'T' in the 'till' call to get lists of characters > instead of strings. These are beginner questions, but why doesn't this work? (in "i.html" (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe
Re: @lib/scrape.l questions
Hi Luis, > I want to go through every link to perform further parsing in some > links, based in the first word of the tile of the link, but ran into > some difficulties. > > Is this the right way to access each of the links? > : (for L *Links (cond ((=T (pre? "Quarto" (car L))) (scrape this link... > ... > I know that scrape.l is available, but still a ton too much for me to > understand it. I would think that "@lib/scrape.l" is not so much for parsing general websites. It is tailored for communication with - and controlling of - interactive PicoLisp GUI applications, and thus rather overkill. You can directly access the contents of a site with 'client', 'from' and 'till'. For example, this prints every link ('href' anchor): (client "picolisp.com" 80 "wiki/?home" (while (from "mailto:picolisp@software-lab.de?subject=Unsubscribe