Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
Jay McCarthy wrote on 01/07/2016 06:53 PM: If you use the `html-parsing` package on the package server, you will not have these problems. The only problem I saw was an error accessing "docindex.sqlite" when "raco pkg install sxml". If there's some way that my official `html-parsing` package is involved with this, please let me know. Also, I will merge the upstream patches shortly. Note that the PLaneT major version of the upstream `html-parsing` package has changed, due to a backwards-incompatible interface change. To merge upstream, I guess new package system policy regarding backward-compatibility will have you making a new package, `html-parsing-2`, unless you diverge from upstream. (This will also be an issue when I officially move my upstream package to the new package system, unless the new package system changes policy, or I pick a new brand name for the upstream package.) Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
So, I'm now doing this: (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) Those loaded just fine right off, although I needed to figure out raco pkg install sxml was necessary, but I got that. During the install, I got the following: [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$ raco pkg install sxml <...ginormous amounts of stuff...> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl raco setup: docs failure: query-exec: unable to open the database file error code: 14 SQL: "ATTACH $1 AS other" database: # mode: 'read-only file permissions: (write read) raco setup: --- installing collections --- raco setup: --- post-installing collections --- raco pkg install: packages installed, although setup reported errors [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ I do have sqlite3 installed on my machine, so that's not the problem. Why is it failing and what do I need to do to fix it? Dave On Thu, Jan 7, 2016 at 1:22 PM, Neil Van Dyke wrote: > > BTW, people should *not* get `html-parsing` from the new package system. > It has an old version that someone else put there unofficially, and it's > missing a significant change. I'm still maintaining the official version > of `html-parsing` in PLaneT (until I get time to change my doc > tools): > > (require (planet neil/html-parsing:3:0)) > > http://www.neilvandyke.org/racket-html-parsing/ > > Neil V. > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
David Storrs wrote on 01/07/2016 05:57 PM: (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) Those loaded just fine right off, although I needed to figure out raco pkg install sxml was necessary, but I got that. You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- just consider it to be starting-point documentation on SXML, for now. For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim Bender's `sxml-match`) all from PLaneT. (I recall John C. put some or all of these into a mega `sxml` package in the new package system, but I haven't tried it.) [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$ raco pkg install sxml <...ginormous amounts of stuff...> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl raco setup: docs failure: query-exec: unable to open the database file error code: 14 SQL: "ATTACH $1 AS other" database: # mode: 'read-only file permissions: (write read) I don't know whether this error is related to the packages themselves, or just a coincidence. Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
On Thu, Jan 7, 2016 at 3:25 PM, John Clementswrote: > > > On Jan 7, 2016, at 2:57 PM, David Storrs wrote: > > > > So, I'm now doing this: > > > > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > > > > Those loaded just fine right off, although I needed to figure out raco > pkg install sxml was necessary, but I got that. > > > > During the install, I got the following: > > > > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > raco pkg install sxml > > <...ginormous amounts of stuff...> > > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > > raco setup: docs failure: query-exec: unable to open the database file > > error code: 14 > > SQL: "ATTACH $1 AS other" > > database: # > > mode: 'read-only > > file permissions: (write read) > > raco setup: --- installing collections --- > > raco setup: --- post-installing collections --- > > raco pkg install: packages installed, although setup reported errors > > [dstorrs@MacBook-Pro > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > > > > I do have sqlite3 installed on my machine, so that's not the problem. > Why is it failing and what do I need to do to fix it? > > Wow! > > As quasi-maintainer of the sxml package… i don’t think this has anything > to do with the sxml package :). > > Does the referenced file (docindex.sqlite) exist? > Yep. And it's readable / writable by me, too: -rw-r--r-- 1 dstorrs staff 518144 Jan 7 14:48 docindex.sqlite > > More generally, I would expect that this doc build failure would not > affect the operability of the sxml library. > > Doesn't seem to have, no. I was just wondering what would cause it. > John > > > > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke wrote: > David Storrs wrote on 01/07/2016 05:57 PM: > >> >> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) >> >> Those loaded just fine right off, although I needed to figure out raco >> pkg install sxml was necessary, but I got that. >> > > You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- just > consider it to be starting-point documentation on SXML, for now. > Ah, thank you. Removed. > > For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim Bender's > `sxml-match`) all from PLaneT. (I recall John C. put some or all of these > into a mega `sxml` package in the new package system, but I haven't tried > it.) > I've installed the sxml package and will let you know how it goes. > > >> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ >> raco pkg install sxml >> <...ginormous amounts of stuff...> >> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl >> raco setup: docs failure: query-exec: unable to open the database file >> error code: 14 >> SQL: "ATTACH $1 AS other" >> database: # >> mode: 'read-only >> file permissions: (write read) >> > > I don't know whether this error is related to the packages themselves, or > just a coincidence. > It doesn't seem to be causing issues right now, so I'm just going to roll by it. > > Neil V. > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
> On Jan 7, 2016, at 2:57 PM, David Storrswrote: > > So, I'm now doing this: > > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > > Those loaded just fine right off, although I needed to figure out raco pkg > install sxml was necessary, but I got that. > > During the install, I got the following: > > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > raco pkg install sxml > <...ginormous amounts of stuff...> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > raco setup: docs failure: query-exec: unable to open the database file > error code: 14 > SQL: "ATTACH $1 AS other" > database: # > mode: 'read-only > file permissions: (write read) > raco setup: --- installing collections --- > raco setup: --- post-installing collections --- > raco pkg install: packages installed, although setup reported errors > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > > > I do have sqlite3 installed on my machine, so that's not the problem. Why is > it failing and what do I need to do to fix it? Wow! As quasi-maintainer of the sxml package… i don’t think this has anything to do with the sxml package :). Does the referenced file (docindex.sqlite) exist? More generally, I would expect that this doc build failure would not affect the operability of the sxml library. John -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
If you use the `html-parsing` package on the package server, you will not have these problems. Also, I will merge the upstream patches shortly. Jay On Thu, Jan 7, 2016 at 6:41 PM, David Storrswrote: > > > On Thu, Jan 7, 2016 at 3:25 PM, John Clements > wrote: >> >> >> > On Jan 7, 2016, at 2:57 PM, David Storrs wrote: >> > >> > So, I'm now doing this: >> > >> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) >> > >> > Those loaded just fine right off, although I needed to figure out raco >> > pkg install sxml was necessary, but I got that. >> > >> > During the install, I got the following: >> > >> > >> > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ >> > raco pkg install sxml >> > <...ginormous amounts of stuff...> >> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl >> > raco setup: docs failure: query-exec: unable to open the database file >> > error code: 14 >> > SQL: "ATTACH $1 AS other" >> > database: >> > # >> > mode: 'read-only >> > file permissions: (write read) >> > raco setup: --- installing collections --- >> > raco setup: --- post-installing collections --- >> > raco pkg install: packages installed, although setup reported errors >> > >> > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ >> > >> > I do have sqlite3 installed on my machine, so that's not the problem. >> > Why is it failing and what do I need to do to fix it? >> >> Wow! >> >> As quasi-maintainer of the sxml package… i don’t think this has anything >> to do with the sxml package :). >> >> Does the referenced file (docindex.sqlite) exist? > > > Yep. And it's readable / writable by me, too: > > -rw-r--r-- 1 dstorrs staff 518144 Jan 7 14:48 docindex.sqlite > >> >> >> More generally, I would expect that this doc build failure would not >> affect the operability of the sxml library. >> > > Doesn't seem to have, no. I was just wondering what would cause it. > >> >> John >> >> >> > > > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke wrote: >> >> David Storrs wrote on 01/07/2016 05:57 PM: >>> >>> >>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) >>> >>> Those loaded just fine right off, although I needed to figure out raco >>> pkg install sxml was necessary, but I got that. >> >> >> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- just >> consider it to be starting-point documentation on SXML, for now. > > > Ah, thank you. Removed. > >> >> >> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim Bender's >> `sxml-match`) all from PLaneT. (I recall John C. put some or all of these >> into a mega `sxml` package in the new package system, but I haven't tried >> it.) > > > I've installed the sxml package and will let you know how it goes. > >> >> >>> >>> >>> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ >>> raco pkg install sxml >>> <...ginormous amounts of stuff...> >>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl >>> raco setup: docs failure: query-exec: unable to open the database file >>> error code: 14 >>> SQL: "ATTACH $1 AS other" >>> database: # >>> mode: 'read-only >>> file permissions: (write read) >> >> >> I don't know whether this error is related to the packages themselves, or >> just a coincidence. > > > It doesn't seem to be causing issues right now, so I'm just going to roll by > it. > > >> >> >> Neil V. >> > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Jay McCarthy Associate Professor PLT @ CS @ UMass Lowell http://jeapostrophe.github.io "Wherefore, be not weary in well-doing, for ye are laying the foundation of a great work. And out of small things proceedeth that which is great." - D 64:33 -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
On Thu, Jan 7, 2016 at 3:53 PM, Jay McCarthywrote: > If you use the `html-parsing` package on the package server, you will > not have these problems. Also, I will merge the upstream patches > shortly. > When I tried to do: "raco pkg install html-parsing" racket crashed. Text dump attached for reference. Dave > Jay > > On Thu, Jan 7, 2016 at 6:41 PM, David Storrs > wrote: > > > > > > On Thu, Jan 7, 2016 at 3:25 PM, John Clements > > > wrote: > >> > >> > >> > On Jan 7, 2016, at 2:57 PM, David Storrs > wrote: > >> > > >> > So, I'm now doing this: > >> > > >> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > >> > > >> > Those loaded just fine right off, although I needed to figure out raco > >> > pkg install sxml was necessary, but I got that. > >> > > >> > During the install, I got the following: > >> > > >> > > >> > [dstorrs@MacBook-Pro > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > >> > raco pkg install sxml > >> > <...ginormous amounts of stuff...> > >> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > >> > raco setup: docs failure: query-exec: unable to open the database file > >> > error code: 14 > >> > SQL: "ATTACH $1 AS other" > >> > database: > >> > # > >> > mode: 'read-only > >> > file permissions: (write read) > >> > raco setup: --- installing collections --- > >> > raco setup: --- post-installing collections --- > >> > raco pkg install: packages installed, although setup reported errors > >> > > >> > [dstorrs@MacBook-Pro > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > >> > > >> > I do have sqlite3 installed on my machine, so that's not the problem. > >> > Why is it failing and what do I need to do to fix it? > >> > >> Wow! > >> > >> As quasi-maintainer of the sxml package… i don’t think this has anything > >> to do with the sxml package :). > >> > >> Does the referenced file (docindex.sqlite) exist? > > > > > > Yep. And it's readable / writable by me, too: > > > > -rw-r--r-- 1 dstorrs staff 518144 Jan 7 14:48 docindex.sqlite > > > >> > >> > >> More generally, I would expect that this doc build failure would not > >> affect the operability of the sxml library. > >> > > > > Doesn't seem to have, no. I was just wondering what would cause it. > > > >> > >> John > >> > >> > >> > > > > > > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke > wrote: > >> > >> David Storrs wrote on 01/07/2016 05:57 PM: > >>> > >>> > >>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > >>> > >>> Those loaded just fine right off, although I needed to figure out raco > >>> pkg install sxml was necessary, but I got that. > >> > >> > >> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- > just > >> consider it to be starting-point documentation on SXML, for now. > > > > > > Ah, thank you. Removed. > > > >> > >> > >> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim > Bender's > >> `sxml-match`) all from PLaneT. (I recall John C. put some or all of > these > >> into a mega `sxml` package in the new package system, but I haven't > tried > >> it.) > > > > > > I've installed the sxml package and will let you know how it goes. > > > >> > >> > >>> > >>> > >>> [dstorrs@MacBook-Pro > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > >>> raco pkg install sxml > >>> <...ginormous amounts of stuff...> > >>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > >>> raco setup: docs failure: query-exec: unable to open the database file > >>> error code: 14 > >>> SQL: "ATTACH $1 AS other" > >>> database: > # > >>> mode: 'read-only > >>> file permissions: (write read) > >> > >> > >> I don't know whether this error is related to the packages themselves, > or > >> just a coincidence. > > > > > > It doesn't seem to be causing issues right now, so I'm just going to > roll by > > it. > > > > > >> > >> > >> Neil V. > >> > > > > -- > > You received this message because you are subscribed to the Google Groups > > "Racket Users" group. > > To unsubscribe from this group and stop receiving emails from it, send an > > email to racket-users+unsubscr...@googlegroups.com. > > For more options, visit https://groups.google.com/d/optout. > > > > -- > Jay McCarthy > Associate Professor > PLT @ CS @ UMass Lowell > http://jeapostrophe.github.io > >"Wherefore, be not weary in well-doing, > for ye are laying the foundation of a great work. > And out of small things proceedeth that which is great." > - D 64:33 > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
I think you're hitting the macro-expander bug that is triggered by "boris", again: https://groups.google.com/d/msg/racket-users/t-qpq3AoEME/Z-rdyTBMAQAJ The "boris" package imports from the "html" collection, and the "html-parsing" package provides a new module in "html", so `raco pkg install html-parsing` checks the compilation of "boris" (just in case). Unfortunately, attempting to compile "boris" still goes wrong. I'm not sure about the earlier "docindex.sqlite" problem. It's possible that the "boris"-triggered crashes have left "/Users/dstorrs/Library/Racket/6.3/doc/docindex.sqlite" in a bad state. At Thu, 7 Jan 2016 16:03:38 -0800, David Storrs wrote: > On Thu, Jan 7, 2016 at 3:53 PM, Jay McCarthywrote: > > > If you use the `html-parsing` package on the package server, you will > > not have these problems. Also, I will merge the upstream patches > > shortly. > > > > When I tried to do: "raco pkg install html-parsing" racket crashed. Text > dump attached for reference. > > Dave > > > > Jay > > > > On Thu, Jan 7, 2016 at 6:41 PM, David Storrs > > wrote: > > > > > > > > > On Thu, Jan 7, 2016 at 3:25 PM, John Clements > > > > > wrote: > > >> > > >> > > >> > On Jan 7, 2016, at 2:57 PM, David Storrs > > wrote: > > >> > > > >> > So, I'm now doing this: > > >> > > > >> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > > >> > > > >> > Those loaded just fine right off, although I needed to figure out raco > > >> > pkg install sxml was necessary, but I got that. > > >> > > > >> > During the install, I got the following: > > >> > > > >> > > > >> > [dstorrs@MacBook-Pro > > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > > >> > raco pkg install sxml > > >> > <...ginormous amounts of stuff...> > > >> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > > >> > raco setup: docs failure: query-exec: unable to open the database file > > >> > error code: 14 > > >> > SQL: "ATTACH $1 AS other" > > >> > database: > > >> > # > > >> > mode: 'read-only > > >> > file permissions: (write read) > > >> > raco setup: --- installing collections --- > > >> > raco setup: --- post-installing collections --- > > >> > raco pkg install: packages installed, although setup reported errors > > >> > > > >> > [dstorrs@MacBook-Pro > > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > > >> > > > >> > I do have sqlite3 installed on my machine, so that's not the problem. > > >> > Why is it failing and what do I need to do to fix it? > > >> > > >> Wow! > > >> > > >> As quasi-maintainer of the sxml package… i don’t think this has anything > > >> to do with the sxml package :). > > >> > > >> Does the referenced file (docindex.sqlite) exist? > > > > > > > > > Yep. And it's readable / writable by me, too: > > > > > > -rw-r--r-- 1 dstorrs staff 518144 Jan 7 14:48 docindex.sqlite > > > > > >> > > >> > > >> More generally, I would expect that this doc build failure would not > > >> affect the operability of the sxml library. > > >> > > > > > > Doesn't seem to have, no. I was just wondering what would cause it. > > > > > >> > > >> John > > >> > > >> > > >> > > > > > > > > > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke > > wrote: > > >> > > >> David Storrs wrote on 01/07/2016 05:57 PM: > > >>> > > >>> > > >>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0)) > > >>> > > >>> Those loaded just fine right off, although I needed to figure out raco > > >>> pkg install sxml was necessary, but I got that. > > >> > > >> > > >> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- > > just > > >> consider it to be starting-point documentation on SXML, for now. > > > > > > > > > Ah, thank you. Removed. > > > > > >> > > >> > > >> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim > > Bender's > > >> `sxml-match`) all from PLaneT. (I recall John C. put some or all of > > these > > >> into a mega `sxml` package in the new package system, but I haven't > > tried > > >> it.) > > > > > > > > > I've installed the sxml package and will let you know how it goes. > > > > > >> > > >> > > >>> > > >>> > > >>> [dstorrs@MacBook-Pro > > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder: ]$ > > >>> raco pkg install sxml > > >>> <...ginormous amounts of stuff...> > > >>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl > > >>> raco setup: docs failure: query-exec: unable to open the database file > > >>> error code: 14 > > >>> SQL: "ATTACH $1 AS other" > > >>> database: > > # > > >>> mode: 'read-only > > >>> file permissions: (write read) > > >> > > >> > > >> I don't know whether this error is related to the packages themselves, > > or > > >> just a coincidence. > > > > > > > > > It doesn't seem to be causing issues right now, so I'm just going to > > roll by > > > it. > > > > > > > > >> > > >>
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
On Thu, Jan 7, 2016 at 12:28 PM, Neil Van Dykewrote: > I just checked that there wasn't a new bug in my old `html-parsing` > package, in case that's which package you meant. > Sorry, I should have been clearer. I'm talking about these: (require html xml), not yours. `html-parsing` correctly handles your example for me under Racket 6.3, so > that package might be a backup option for you (but beware that it uses SXML > representation, rather than the Racket `xml` representation). > Yeah, I've been looking at it since sending the message. It parses it correctly, which is great. I was just hoping not to have to rewrite everything for a new representation. Ah, well. It's a learning project, so this is just more learning. Dave > #lang racket/base > > (require (planet neil/html-parsing:3:0)) > > (html->xexp > (string-append > "\n" > "\n" > "\n" > "\n" > "Message text here \n" > "\n" > "\n" > "\n")) > > ;; ==> > ;; (*TOP* > ;; (div (@ (class "messageInfo primaryContent")) > ;; "\n" > ;; (div (@ (class "messageContent")) > ;;"\n" > ;;(article "\n" > ;; (blockquote (@ (class "messageText > SelectQuoteContainer ugc baseHtml")) > ;; "\n" > ;; "Message text here " > ;; (br) > ;; "\n") > ;; "\n") > ;;"\n") > ;; "\n")) > > http://www.neilvandyke.org/racket-html-parsing/ > > Neil V. > > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
On Thursday, January 7, 2016 at 9:42:30 PM UTC+1, Neil Van Dyke wrote: > (A long, long time ago, Racket had its own Web browser, for viewing its > documentation, and it had at least one funny HTML extension.) That's interesting - what happened to the Web browser - was it implemented in Racket? Could it work today - as a learning exercise? Thanks Greg -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
It's probably on your computer: http://docs.racket-lang.org/browser/index.html?q=browser#%28mod-path._browser%29 On Thu, Jan 7, 2016 at 3:46 PM, Greg Trzeciakwrote: > On Thursday, January 7, 2016 at 9:42:30 PM UTC+1, Neil Van Dyke wrote: >> (A long, long time ago, Racket had its own Web browser, for viewing its >> documentation, and it had at least one funny HTML extension.) > > That's interesting - what happened to the Web browser - was it implemented in > Racket? Could it work today - as a learning exercise? > > Thanks > > Greg > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Jay McCarthy Associate Professor PLT @ CS @ UMass Lowell http://jeapostrophe.github.io "Wherefore, be not weary in well-doing, for ye are laying the foundation of a great work. And out of small things proceedeth that which is great." - D 64:33 -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
I just checked that there wasn't a new bug in my old `html-parsing` package, in case that's which package you meant. `html-parsing` correctly handles your example for me under Racket 6.3, so that package might be a backup option for you (but beware that it uses SXML representation, rather than the Racket `xml` representation). #lang racket/base (require (planet neil/html-parsing:3:0)) (html->xexp (string-append "\n" "\n" "\n" "\n" "Message text here \n" "\n" "\n" "\n")) ;; ==> ;; (*TOP* ;; (div (@ (class "messageInfo primaryContent")) ;; "\n" ;; (div (@ (class "messageContent")) ;;"\n" ;;(article "\n" ;; (blockquote (@ (class "messageText SelectQuoteContainer ugc baseHtml")) ;; "\n" ;; "Message text here " ;; (br) ;; "\n") ;; "\n") ;;"\n") ;; "\n")) http://www.neilvandyke.org/racket-html-parsing/ Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
This is a wild guess, without looking at the code, but I wouldn't be surprised if this Racket `html` package had some obsolete support for a special `article` HTML element, and that code could be removed today. (A long, long time ago, Racket had its own Web browser, for viewing its documentation, and it had at least one funny HTML extension.) Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
Can you send the code you used? I wouldn't expect the xml library to work since your example is not XML (missing ). I also don't have high hopes for the html library. If you are parsing html, I recommend using the `html-parsing` package: http://pkg-build.racket-lang.org/doc/html-parsing/index.html Jay On Thu, Jan 7, 2016 at 3:13 PM, David Storrswrote: > Hi folks, > > I'm using the html and xml libraries to parse a page that includes the > following HTML: > > > > > > Message text here > > > > > When I parse this, the 'article' tag simply isn't parsed -- it lists the > contents of the messageContent div as just a series of PCDATA statements > containing "\n" > > Is there a way to extend the library, or do I need to switch to a different > parser? > > Dave > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. -- Jay McCarthy Associate Professor PLT @ CS @ UMass Lowell http://jeapostrophe.github.io "Wherefore, be not weary in well-doing, for ye are laying the foundation of a great work. And out of small things proceedeth that which is great." - D 64:33 -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
When we speak of "parsing HTML" we should distinguish between strict parsing (= explicit adherence to a given HTML spec) and permissive parsing (= converting an HTML-ish string into Racket data.) Both have their place. `article` became a valid HTML element in HTML5. IIUC the html library is a strict parser, and it doesn't implement HTML5, thus it doesn't support `article`. Whereas `html-parsing` is designed to be permissive. On Thu, Jan 7, 2016 at 12:42 PM, Neil Van Dykewrote: > This is a wild guess, without looking at the code, but I wouldn't be > surprised if this Racket `html` package had some obsolete support for a > special `article` HTML element, and that code could be removed today. (A > long, long time ago, Racket had its own Web browser, for viewing its > documentation, and it had at least one funny HTML extension.) > > > Neil V. > > -- > You received this message because you are subscribed to the Google Groups > "Racket Users" group. > To unsubscribe from this group and stop receiving emails from it, send an > email to racket-users+unsubscr...@googlegroups.com. > For more options, visit https://groups.google.com/d/optout. > -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
Matthew Butterick wrote on 01/07/2016 04:18 PM: When we speak of "parsing HTML" we should distinguish between strict parsing (= explicit adherence to a given HTML spec) and permissive parsing (= converting an HTML-ish string into Racket data.) Both have their place. Alas, I think the W3C had to give up on trying to make people do strict parsing. Not enough people ran the W3C Validator in the earlier days of the Web, and the (since-abandoned) XML-based XHTML standard was started after the strict ship had long since sailed. The W3C has moved behind HTML5 for now. The `html-parsing` parser was written 15 years ago for doing AI-ish software agent scraping of info from real-world Web pages, so it was necessarily permissive. In some ways, HTML was even worse back then, because Mosaic/Navigator/MSIE tended to accept invalid HTML-- like if the Racket compiler never raised an error or gave a warning message for an error, and simply generated whatever code it wanted to, and programmers worked by mindlessly poking at their source code until the generated code seemed to be doing what they wanted. :) Syntactically, real-world HTML is somewhat better now, because the development tools and the browsers are better. But a permissive parser still makes sense for most purposes, including the massive HTML5 of 15 years later. Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
Great, thanks - didn't expect it to be still there after reading Neil's post! On Thursday, January 7, 2016 at 9:52:01 PM UTC+1, Jay McCarthy wrote: > It's probably on your computer: > > http://docs.racket-lang.org/browser/index.html?q=browser#%28mod-path._browser%29 -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.
Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?
BTW, people should *not* get `html-parsing` from the new package system. It has an old version that someone else put there unofficially, and it's missing a significant change. I'm still maintaining the official version of `html-parsing` in PLaneT (until I get time to change my doc tools): (require (planet neil/html-parsing:3:0)) http://www.neilvandyke.org/racket-html-parsing/ Neil V. -- You received this message because you are subscribed to the Google Groups "Racket Users" group. To unsubscribe from this group and stop receiving emails from it, send an email to racket-users+unsubscr...@googlegroups.com. For more options, visit https://groups.google.com/d/optout.