Re: [racket-users] Html to text, how to obtain a rough preview

2017-06-01 Thread Neil Van Dyke
Aside on a security issue with some example code... I know in this case it was being done as prototype/experiment, which is perfectly fine, but since people often learn from code they see on the email list, we should probably mention... (system/exit-code (string-append

Re: [racket-users] Html to text, how to obtain a rough preview

2017-06-01 Thread Erich Rast
Thanks you all so much for your help! Philip McGrath's method seems to overall work best, although many sites create a parsing error. However, I found a potentially better solution on linux that I might use if I can get it to work cross-platform. There is an ingenious tool called gnome-web-photo

Re: [racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Neil Van Dyke
Erich Rast wrote on 05/30/2017 04:37 PM: I've found out that it's far less trivial than expected, but not because of sxml or the tree walking itself. Often call/input-url just returns '() or '(*TOP*), To troubleshoot this... First, I'd take a quick look at the application code. Then I'd

Re: [racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Philip McGrath
I would handle this by adding some special cases to ignore the content of script tags, extract alt text for images when it's provided, etc. This gets meaningful content from both nytimes.com and cnn.com (though CNN seems to only have their navigation links accessible without JavaScript): #lang

Re: [racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Jon Zeppieri
((sxpath '(// *text*)) doc) should return all (and only) the text nodes in doc. I'm not so familiar with the sxml-xexp compatibility stuff, so I don't know if you can use an xexp here or if you really need an sxml document. On Tue, May 30, 2017 at 7:08 AM, Erich Rast wrote: > Hi

Re: [racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Erich Rast
On Tue, 30 May 2017 11:30:00 -0400 Neil Van Dyke wrote: > Writing a procedure that does what you want should be pretty easy, > and then you can fine-tune it to your particular application. Recall > that SXML is mostly old-school Lisp lists. The procedure is mostly > just a

Re: [racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Neil Van Dyke
Erich Rast wrote on 05/30/2017 07:08 AM: I need a function to provide a rough textual preview (without formatting except newlines) of the content of a web page. Writing a procedure that does what you want should be pretty easy, and then you can fine-tune it to your particular application.

[racket-users] Html to text, how to obtain a rough preview

2017-05-30 Thread Erich Rast
Hi all, I need a function to provide a rough textual preview (without formatting except newlines) of the content of a web page. So far I'm using this: (require net/url html-parsing sxml) (provide fetch fetch-string-content) (define (fetch url) (call/input-url url