Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke


Jay McCarthy wrote on 01/07/2016 06:53 PM:

If you use the `html-parsing` package on the package server, you will
not have these problems.


The only problem I saw was an error accessing "docindex.sqlite" when 
"raco pkg install sxml".


If there's some way that my official `html-parsing` package is involved 
with this, please let me know.



Also, I will merge the upstream patches shortly.


Note that the PLaneT major version of the upstream `html-parsing` 
package has changed, due to a backwards-incompatible interface change.  
To merge upstream, I guess new package system policy regarding 
backward-compatibility will have you making a new package, 
`html-parsing-2`, unless you diverge from upstream.  (This will also be 
an issue when I officially move my upstream package to the new package 
system, unless the new package system changes policy, or I pick a new 
brand name for the upstream package.)


Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread David Storrs
So, I'm now doing this:

(require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))

Those loaded just fine right off, although I needed to figure out raco pkg
install sxml was necessary, but I got that.

During the install, I got the following:

[dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
raco pkg install sxml
<...ginormous amounts of stuff...>
raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
raco setup: docs failure: query-exec: unable to open the database file
  error code: 14
  SQL: "ATTACH $1 AS other"
  database: #
  mode: 'read-only
  file permissions: (write read)
raco setup: --- installing collections ---
raco setup: --- post-installing collections ---
raco pkg install: packages installed, although setup reported errors
[dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$


I do have sqlite3 installed on my machine, so that's not the problem.  Why
is it failing and what do I need to do to fix it?

Dave


On Thu, Jan 7, 2016 at 1:22 PM, Neil Van Dyke  wrote:

>
> BTW, people should *not* get `html-parsing` from the new package system.
> It has an old version that someone else put there unofficially, and it's
> missing a significant change.  I'm still maintaining the official version
> of `html-parsing` in PLaneT (until I get time to change my doc
> tools):
>
> (require (planet neil/html-parsing:3:0))
>
> http://www.neilvandyke.org/racket-html-parsing/
>
> Neil V.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke

David Storrs wrote on 01/07/2016 05:57 PM:


(require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))

Those loaded just fine right off, although I needed to figure out raco 
pkg install sxml was necessary, but I got that.


You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- 
just consider it to be starting-point documentation on SXML, for now.


For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim 
Bender's `sxml-match`) all from PLaneT.  (I recall John C. put some or 
all of these into a mega `sxml` package in the new package system, but I 
haven't tried it.)




[dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$ 
raco pkg install sxml

<...ginormous amounts of stuff...>
raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
raco setup: docs failure: query-exec: unable to open the database file
  error code: 14
  SQL: "ATTACH $1 AS other"
  database: #
  mode: 'read-only
  file permissions: (write read)


I don't know whether this error is related to the packages themselves, 
or just a coincidence.


Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread David Storrs
On Thu, Jan 7, 2016 at 3:25 PM, John Clements 
wrote:

>
> > On Jan 7, 2016, at 2:57 PM, David Storrs  wrote:
> >
> > So, I'm now doing this:
> >
> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> >
> > Those loaded just fine right off, although I needed to figure out raco
> pkg install sxml was necessary, but I got that.
> >
> > During the install, I got the following:
> >
> > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> raco pkg install sxml
> > <...ginormous amounts of stuff...>
> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> > raco setup: docs failure: query-exec: unable to open the database file
> >   error code: 14
> >   SQL: "ATTACH $1 AS other"
> >   database: #
> >   mode: 'read-only
> >   file permissions: (write read)
> > raco setup: --- installing collections ---
> > raco setup: --- post-installing collections ---
> > raco pkg install: packages installed, although setup reported errors
> > [dstorrs@MacBook-Pro
> :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> >
> > I do have sqlite3 installed on my machine, so that's not the problem.
> Why is it failing and what do I need to do to fix it?
>
> Wow!
>
> As quasi-maintainer of the sxml package… i don’t think this has anything
> to do with the sxml package :).
>
> Does the referenced file (docindex.sqlite) exist?
>

Yep.  And it's readable / writable by me, too:

  -rw-r--r--   1 dstorrs  staff  518144 Jan  7 14:48 docindex.sqlite


>
> More generally, I would expect that this doc build failure would not
> affect the operability of the sxml library.
>
>
Doesn't seem to have, no.  I was just wondering what would cause it.


> John
>
>
>
>

On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke  wrote:

> David Storrs wrote on 01/07/2016 05:57 PM:
>
>>
>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
>>
>> Those loaded just fine right off, although I needed to figure out raco
>> pkg install sxml was necessary, but I got that.
>>
>
> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- just
> consider it to be starting-point documentation on SXML, for now.
>

Ah, thank you.  Removed.


>
> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim Bender's
> `sxml-match`) all from PLaneT.  (I recall John C. put some or all of these
> into a mega `sxml` package in the new package system, but I haven't tried
> it.)
>

I've installed the sxml package and will let you know how it goes.


>
>
>> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>> raco pkg install sxml
>> <...ginormous amounts of stuff...>
>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
>> raco setup: docs failure: query-exec: unable to open the database file
>>   error code: 14
>>   SQL: "ATTACH $1 AS other"
>>   database: #
>>   mode: 'read-only
>>   file permissions: (write read)
>>
>
> I don't know whether this error is related to the packages themselves, or
> just a coincidence.
>

It doesn't seem to be causing issues right now, so I'm just going to roll
by it.



>
> Neil V.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread 'John Clements' via Racket Users

> On Jan 7, 2016, at 2:57 PM, David Storrs  wrote:
> 
> So, I'm now doing this:
> 
> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> 
> Those loaded just fine right off, although I needed to figure out raco pkg 
> install sxml was necessary, but I got that.  
> 
> During the install, I got the following:
> 
> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>  raco pkg install sxml
> <...ginormous amounts of stuff...>
> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> raco setup: docs failure: query-exec: unable to open the database file
>   error code: 14
>   SQL: "ATTACH $1 AS other"
>   database: #
>   mode: 'read-only
>   file permissions: (write read)
> raco setup: --- installing collections ---
> raco setup: --- post-installing collections ---
> raco pkg install: packages installed, although setup reported errors
> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>  
> 
> I do have sqlite3 installed on my machine, so that's not the problem.  Why is 
> it failing and what do I need to do to fix it?

Wow! 

As quasi-maintainer of the sxml package… i don’t think this has anything to do 
with the sxml package :). 

Does the referenced file (docindex.sqlite) exist?

More generally, I would expect that this doc build failure would not affect the 
operability of the sxml library.

John



-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Jay McCarthy
If you use the `html-parsing` package on the package server, you will
not have these problems. Also, I will merge the upstream patches
shortly.

Jay

On Thu, Jan 7, 2016 at 6:41 PM, David Storrs  wrote:
>
>
> On Thu, Jan 7, 2016 at 3:25 PM, John Clements 
> wrote:
>>
>>
>> > On Jan 7, 2016, at 2:57 PM, David Storrs  wrote:
>> >
>> > So, I'm now doing this:
>> >
>> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
>> >
>> > Those loaded just fine right off, although I needed to figure out raco
>> > pkg install sxml was necessary, but I got that.
>> >
>> > During the install, I got the following:
>> >
>> >
>> > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>> > raco pkg install sxml
>> > <...ginormous amounts of stuff...>
>> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
>> > raco setup: docs failure: query-exec: unable to open the database file
>> >   error code: 14
>> >   SQL: "ATTACH $1 AS other"
>> >   database:
>> > #
>> >   mode: 'read-only
>> >   file permissions: (write read)
>> > raco setup: --- installing collections ---
>> > raco setup: --- post-installing collections ---
>> > raco pkg install: packages installed, although setup reported errors
>> >
>> > [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>> >
>> > I do have sqlite3 installed on my machine, so that's not the problem.
>> > Why is it failing and what do I need to do to fix it?
>>
>> Wow!
>>
>> As quasi-maintainer of the sxml package… i don’t think this has anything
>> to do with the sxml package :).
>>
>> Does the referenced file (docindex.sqlite) exist?
>
>
> Yep.  And it's readable / writable by me, too:
>
>   -rw-r--r--   1 dstorrs  staff  518144 Jan  7 14:48 docindex.sqlite
>
>>
>>
>> More generally, I would expect that this doc build failure would not
>> affect the operability of the sxml library.
>>
>
> Doesn't seem to have, no.  I was just wondering what would cause it.
>
>>
>> John
>>
>>
>>
>
>
> On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke  wrote:
>>
>> David Storrs wrote on 01/07/2016 05:57 PM:
>>>
>>>
>>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
>>>
>>> Those loaded just fine right off, although I needed to figure out raco
>>> pkg install sxml was necessary, but I got that.
>>
>>
>> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly -- just
>> consider it to be starting-point documentation on SXML, for now.
>
>
> Ah, thank you.  Removed.
>
>>
>>
>> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim Bender's
>> `sxml-match`) all from PLaneT.  (I recall John C. put some or all of these
>> into a mega `sxml` package in the new package system, but I haven't tried
>> it.)
>
>
> I've installed the sxml package and will let you know how it goes.
>
>>
>>
>>>
>>>
>>> [dstorrs@MacBook-Pro:~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
>>> raco pkg install sxml
>>> <...ginormous amounts of stuff...>
>>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
>>> raco setup: docs failure: query-exec: unable to open the database file
>>>   error code: 14
>>>   SQL: "ATTACH $1 AS other"
>>>   database: #
>>>   mode: 'read-only
>>>   file permissions: (write read)
>>
>>
>> I don't know whether this error is related to the packages themselves, or
>> just a coincidence.
>
>
> It doesn't seem to be causing issues right now, so I'm just going to roll by
> it.
>
>
>>
>>
>> Neil V.
>>
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Jay McCarthy
Associate Professor
PLT @ CS @ UMass Lowell
http://jeapostrophe.github.io

   "Wherefore, be not weary in well-doing,
  for ye are laying the foundation of a great work.
And out of small things proceedeth that which is great."
  - D 64:33

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread David Storrs
On Thu, Jan 7, 2016 at 3:53 PM, Jay McCarthy  wrote:

> If you use the `html-parsing` package on the package server, you will
> not have these problems. Also, I will merge the upstream patches
> shortly.
>

When I tried to do:  "raco pkg install html-parsing" racket crashed.  Text
dump attached for reference.

Dave


> Jay
>
> On Thu, Jan 7, 2016 at 6:41 PM, David Storrs 
> wrote:
> >
> >
> > On Thu, Jan 7, 2016 at 3:25 PM, John Clements  >
> > wrote:
> >>
> >>
> >> > On Jan 7, 2016, at 2:57 PM, David Storrs 
> wrote:
> >> >
> >> > So, I'm now doing this:
> >> >
> >> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> >> >
> >> > Those loaded just fine right off, although I needed to figure out raco
> >> > pkg install sxml was necessary, but I got that.
> >> >
> >> > During the install, I got the following:
> >> >
> >> >
> >> > [dstorrs@MacBook-Pro
> :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> >> > raco pkg install sxml
> >> > <...ginormous amounts of stuff...>
> >> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> >> > raco setup: docs failure: query-exec: unable to open the database file
> >> >   error code: 14
> >> >   SQL: "ATTACH $1 AS other"
> >> >   database:
> >> > #
> >> >   mode: 'read-only
> >> >   file permissions: (write read)
> >> > raco setup: --- installing collections ---
> >> > raco setup: --- post-installing collections ---
> >> > raco pkg install: packages installed, although setup reported errors
> >> >
> >> > [dstorrs@MacBook-Pro
> :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> >> >
> >> > I do have sqlite3 installed on my machine, so that's not the problem.
> >> > Why is it failing and what do I need to do to fix it?
> >>
> >> Wow!
> >>
> >> As quasi-maintainer of the sxml package… i don’t think this has anything
> >> to do with the sxml package :).
> >>
> >> Does the referenced file (docindex.sqlite) exist?
> >
> >
> > Yep.  And it's readable / writable by me, too:
> >
> >   -rw-r--r--   1 dstorrs  staff  518144 Jan  7 14:48 docindex.sqlite
> >
> >>
> >>
> >> More generally, I would expect that this doc build failure would not
> >> affect the operability of the sxml library.
> >>
> >
> > Doesn't seem to have, no.  I was just wondering what would cause it.
> >
> >>
> >> John
> >>
> >>
> >>
> >
> >
> > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke 
> wrote:
> >>
> >> David Storrs wrote on 01/07/2016 05:57 PM:
> >>>
> >>>
> >>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> >>>
> >>> Those loaded just fine right off, although I needed to figure out raco
> >>> pkg install sxml was necessary, but I got that.
> >>
> >>
> >> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly --
> just
> >> consider it to be starting-point documentation on SXML, for now.
> >
> >
> > Ah, thank you.  Removed.
> >
> >>
> >>
> >> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim
> Bender's
> >> `sxml-match`) all from PLaneT.  (I recall John C. put some or all of
> these
> >> into a mega `sxml` package in the new package system, but I haven't
> tried
> >> it.)
> >
> >
> > I've installed the sxml package and will let you know how it goes.
> >
> >>
> >>
> >>>
> >>>
> >>> [dstorrs@MacBook-Pro
> :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> >>> raco pkg install sxml
> >>> <...ginormous amounts of stuff...>
> >>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> >>> raco setup: docs failure: query-exec: unable to open the database file
> >>>   error code: 14
> >>>   SQL: "ATTACH $1 AS other"
> >>>   database:
> #
> >>>   mode: 'read-only
> >>>   file permissions: (write read)
> >>
> >>
> >> I don't know whether this error is related to the packages themselves,
> or
> >> just a coincidence.
> >
> >
> > It doesn't seem to be causing issues right now, so I'm just going to
> roll by
> > it.
> >
> >
> >>
> >>
> >> Neil V.
> >>
> >
> > --
> > You received this message because you are subscribed to the Google Groups
> > "Racket Users" group.
> > To unsubscribe from this group and stop receiving emails from it, send an
> > email to racket-users+unsubscr...@googlegroups.com.
> > For more options, visit https://groups.google.com/d/optout.
>
>
>
> --
> Jay McCarthy
> Associate Professor
> PLT @ CS @ UMass Lowell
> http://jeapostrophe.github.io
>
>"Wherefore, be not weary in well-doing,
>   for ye are laying the foundation of a great work.
> And out of small things proceedeth that which is great."
>   - D 64:33
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.

Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Matthew Flatt
I think you're hitting the macro-expander bug that is triggered by
"boris", again:

 https://groups.google.com/d/msg/racket-users/t-qpq3AoEME/Z-rdyTBMAQAJ

The "boris" package imports from the "html" collection, and the
"html-parsing" package provides a new module in "html", so `raco pkg
install html-parsing` checks the compilation of "boris" (just in case).
Unfortunately, attempting to compile "boris" still goes wrong.

I'm not sure about the earlier "docindex.sqlite" problem. It's possible
that the "boris"-triggered crashes have left
"/Users/dstorrs/Library/Racket/6.3/doc/docindex.sqlite" in a bad state.

At Thu, 7 Jan 2016 16:03:38 -0800, David Storrs wrote:
> On Thu, Jan 7, 2016 at 3:53 PM, Jay McCarthy  wrote:
> 
> > If you use the `html-parsing` package on the package server, you will
> > not have these problems. Also, I will merge the upstream patches
> > shortly.
> >
> 
> When I tried to do:  "raco pkg install html-parsing" racket crashed.  Text
> dump attached for reference.
> 
> Dave
> 
> 
> > Jay
> >
> > On Thu, Jan 7, 2016 at 6:41 PM, David Storrs 
> > wrote:
> > >
> > >
> > > On Thu, Jan 7, 2016 at 3:25 PM, John Clements  > >
> > > wrote:
> > >>
> > >>
> > >> > On Jan 7, 2016, at 2:57 PM, David Storrs 
> > wrote:
> > >> >
> > >> > So, I'm now doing this:
> > >> >
> > >> > (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> > >> >
> > >> > Those loaded just fine right off, although I needed to figure out raco
> > >> > pkg install sxml was necessary, but I got that.
> > >> >
> > >> > During the install, I got the following:
> > >> >
> > >> >
> > >> > [dstorrs@MacBook-Pro
> > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> > >> > raco pkg install sxml
> > >> > <...ginormous amounts of stuff...>
> > >> > raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> > >> > raco setup: docs failure: query-exec: unable to open the database file
> > >> >   error code: 14
> > >> >   SQL: "ATTACH $1 AS other"
> > >> >   database:
> > >> > #
> > >> >   mode: 'read-only
> > >> >   file permissions: (write read)
> > >> > raco setup: --- installing collections ---
> > >> > raco setup: --- post-installing collections ---
> > >> > raco pkg install: packages installed, although setup reported errors
> > >> >
> > >> > [dstorrs@MacBook-Pro
> > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> > >> >
> > >> > I do have sqlite3 installed on my machine, so that's not the problem.
> > >> > Why is it failing and what do I need to do to fix it?
> > >>
> > >> Wow!
> > >>
> > >> As quasi-maintainer of the sxml package… i don’t think this has anything
> > >> to do with the sxml package :).
> > >>
> > >> Does the referenced file (docindex.sqlite) exist?
> > >
> > >
> > > Yep.  And it's readable / writable by me, too:
> > >
> > >   -rw-r--r--   1 dstorrs  staff  518144 Jan  7 14:48 docindex.sqlite
> > >
> > >>
> > >>
> > >> More generally, I would expect that this doc build failure would not
> > >> affect the operability of the sxml library.
> > >>
> > >
> > > Doesn't seem to have, no.  I was just wondering what would cause it.
> > >
> > >>
> > >> John
> > >>
> > >>
> > >>
> > >
> > >
> > > On Thu, Jan 7, 2016 at 3:35 PM, Neil Van Dyke 
> > wrote:
> > >>
> > >> David Storrs wrote on 01/07/2016 05:57 PM:
> > >>>
> > >>>
> > >>> (require (planet neil/html-parsing:3:0) (planet neil/xexp:2:0))
> > >>>
> > >>> Those loaded just fine right off, although I needed to figure out raco
> > >>> pkg install sxml was necessary, but I got that.
> > >>
> > >>
> > >> You shouldn't need to require `(planet neil/xexp:2:0)` explicitly --
> > just
> > >> consider it to be starting-point documentation on SXML, for now.
> > >
> > >
> > > Ah, thank you.  Removed.
> > >
> > >>
> > >>
> > >> For other SXML tools, I get them (Oleg's SSAX and SXPath, and Jim
> > Bender's
> > >> `sxml-match`) all from PLaneT.  (I recall John C. put some or all of
> > these
> > >> into a mega `sxml` package in the new package system, but I haven't
> > tried
> > >> it.)
> > >
> > >
> > > I've installed the sxml package and will let you know how it goes.
> > >
> > >>
> > >>
> > >>>
> > >>>
> > >>> [dstorrs@MacBook-Pro
> > :~/Dropbox/dstorrs/personal/study/scheme/HTML-TreeBuilder:]$
> > >>> raco pkg install sxml
> > >>> <...ginormous amounts of stuff...>
> > >>> raco setup: 4 skipping: /xrepl-doc/xrepl/xrepl.scrbl
> > >>> raco setup: docs failure: query-exec: unable to open the database file
> > >>>   error code: 14
> > >>>   SQL: "ATTACH $1 AS other"
> > >>>   database:
> > #
> > >>>   mode: 'read-only
> > >>>   file permissions: (write read)
> > >>
> > >>
> > >> I don't know whether this error is related to the packages themselves,
> > or
> > >> just a coincidence.
> > >
> > >
> > > It doesn't seem to be causing issues right now, so I'm just going to
> > roll by
> > > it.
> > >
> > >
> > >>
> > >>

Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread David Storrs
On Thu, Jan 7, 2016 at 12:28 PM, Neil Van Dyke  wrote:

> I just checked that there wasn't a new bug in my old `html-parsing`
> package, in case that's which package you meant.
>

Sorry, I should have been clearer.  I'm talking about these:  (require html
xml), not yours.


`html-parsing` correctly handles your example for me under Racket 6.3, so
> that package might be a backup option for you (but beware that it uses SXML
> representation, rather than the Racket `xml` representation).
>

 Yeah, I've been looking at it since sending the message.  It parses it
correctly, which is great.  I was just hoping not to have to rewrite
everything for a new representation.

Ah, well.  It's a learning project, so this is just more learning.

Dave



> #lang racket/base
>
> (require (planet neil/html-parsing:3:0))
>
> (html->xexp
>  (string-append
>   "\n"
>   "\n"
>   "\n"
>   "\n"
>   "Message text here \n"
>   "\n"
>   "\n"
>   "\n"))
>
> ;; ==>
> ;; (*TOP*
> ;;  (div (@ (class "messageInfo primaryContent"))
> ;;   "\n"
> ;;   (div (@ (class "messageContent"))
> ;;"\n"
> ;;(article "\n"
> ;; (blockquote (@ (class "messageText
> SelectQuoteContainer ugc baseHtml"))
> ;; "\n"
> ;; "Message text here "
> ;; (br)
> ;; "\n")
> ;; "\n")
> ;;"\n")
> ;;   "\n"))
>
> http://www.neilvandyke.org/racket-html-parsing/
>
> Neil V.
>
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Greg Trzeciak
On Thursday, January 7, 2016 at 9:42:30 PM UTC+1, Neil Van Dyke wrote:
> (A long, long time ago, Racket had its own Web browser, for viewing its 
> documentation, and it had at least one funny HTML extension.)

That's interesting - what happened to the Web browser - was it implemented in 
Racket? Could it work today - as a learning exercise?

Thanks

Greg

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Jay McCarthy
It's probably on your computer:

http://docs.racket-lang.org/browser/index.html?q=browser#%28mod-path._browser%29

On Thu, Jan 7, 2016 at 3:46 PM, Greg Trzeciak  wrote:
> On Thursday, January 7, 2016 at 9:42:30 PM UTC+1, Neil Van Dyke wrote:
>> (A long, long time ago, Racket had its own Web browser, for viewing its
>> documentation, and it had at least one funny HTML extension.)
>
> That's interesting - what happened to the Web browser - was it implemented in 
> Racket? Could it work today - as a learning exercise?
>
> Thanks
>
> Greg
>
> --
> You received this message because you are subscribed to the Google Groups 
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an 
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Jay McCarthy
Associate Professor
PLT @ CS @ UMass Lowell
http://jeapostrophe.github.io

   "Wherefore, be not weary in well-doing,
  for ye are laying the foundation of a great work.
And out of small things proceedeth that which is great."
  - D 64:33

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke
I just checked that there wasn't a new bug in my old `html-parsing` 
package, in case that's which package you meant.


`html-parsing` correctly handles your example for me under Racket 6.3, 
so that package might be a backup option for you (but beware that it 
uses SXML representation, rather than the Racket `xml` representation).


#lang racket/base

(require (planet neil/html-parsing:3:0))

(html->xexp
 (string-append
  "\n"
  "\n"
  "\n"
  "\n"
  "Message text here \n"
  "\n"
  "\n"
  "\n"))

;; ==>
;; (*TOP*
;;  (div (@ (class "messageInfo primaryContent"))
;;   "\n"
;;   (div (@ (class "messageContent"))
;;"\n"
;;(article "\n"
;; (blockquote (@ (class "messageText 
SelectQuoteContainer ugc baseHtml"))

;; "\n"
;; "Message text here "
;; (br)
;; "\n")
;; "\n")
;;"\n")
;;   "\n"))

http://www.neilvandyke.org/racket-html-parsing/

Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke
This is a wild guess, without looking at the code, but I wouldn't be 
surprised if this Racket `html` package had some obsolete support for a 
special `article` HTML element, and that code could be removed today.  
(A long, long time ago, Racket had its own Web browser, for viewing its 
documentation, and it had at least one funny HTML extension.)


Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Jay McCarthy
Can you send the code you used? I wouldn't expect the xml library to
work since your example is not XML (missing ). I also don't have
high hopes for the html library. If you are parsing html, I recommend
using the `html-parsing` package:
http://pkg-build.racket-lang.org/doc/html-parsing/index.html

Jay

On Thu, Jan 7, 2016 at 3:13 PM, David Storrs  wrote:
> Hi folks,
>
> I'm using the html and xml libraries to parse a page that includes the
> following HTML:
>
> 
> 
> 
> 
> Message text here 
> 
> 
> 
>
> When I parse this, the 'article' tag simply isn't parsed -- it lists the
> contents of the messageContent div as just a series of PCDATA statements
> containing "\n"
>
> Is there a way to extend the library, or do I need to switch to a different
> parser?
>
> Dave
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.



-- 
Jay McCarthy
Associate Professor
PLT @ CS @ UMass Lowell
http://jeapostrophe.github.io

   "Wherefore, be not weary in well-doing,
  for ye are laying the foundation of a great work.
And out of small things proceedeth that which is great."
  - D 64:33

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Matthew Butterick
When we speak of "parsing HTML" we should distinguish between strict
parsing (= explicit adherence to a given HTML spec) and permissive parsing
(= converting an HTML-ish string into Racket data.) Both have their place.

`article` became a valid HTML element in HTML5. IIUC the html library is a
strict parser, and it doesn't implement HTML5, thus it doesn't support
`article`.

Whereas `html-parsing` is designed to be permissive.




On Thu, Jan 7, 2016 at 12:42 PM, Neil Van Dyke  wrote:

> This is a wild guess, without looking at the code, but I wouldn't be
> surprised if this Racket `html` package had some obsolete support for a
> special `article` HTML element, and that code could be removed today.  (A
> long, long time ago, Racket had its own Web browser, for viewing its
> documentation, and it had at least one funny HTML extension.)
>
>
> Neil V.
>
> --
> You received this message because you are subscribed to the Google Groups
> "Racket Users" group.
> To unsubscribe from this group and stop receiving emails from it, send an
> email to racket-users+unsubscr...@googlegroups.com.
> For more options, visit https://groups.google.com/d/optout.
>

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke



Matthew Butterick wrote on 01/07/2016 04:18 PM:
When we speak of "parsing HTML" we should distinguish between strict 
parsing (= explicit adherence to a given HTML spec) and permissive 
parsing (= converting an HTML-ish string into Racket data.) Both have 
their place.


Alas, I think the W3C had to give up on trying to make people do strict 
parsing.  Not enough people ran the W3C Validator in the earlier days of 
the Web, and the (since-abandoned) XML-based XHTML standard was started 
after the strict ship had long since sailed. The W3C has moved behind 
HTML5 for now.


The `html-parsing` parser was written 15 years ago for doing AI-ish 
software agent scraping of info from real-world Web pages, so it was 
necessarily permissive.  In some ways, HTML was even worse back then, 
because Mosaic/Navigator/MSIE tended to accept invalid HTML-- like if 
the Racket compiler never raised an error or gave a warning message for 
an error, and simply generated whatever code it wanted to, and 
programmers worked by mindlessly poking at their source code until the 
generated code seemed to be doing what they wanted. :) Syntactically, 
real-world HTML is somewhat better now, because the development tools 
and the browsers are better.  But a permissive parser still makes sense 
for most purposes, including the massive HTML5 of 15 years later.


Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Greg Trzeciak
Great, thanks - didn't expect it to be still there after reading Neil's post!

On Thursday, January 7, 2016 at 9:52:01 PM UTC+1, Jay McCarthy wrote:
> It's probably on your computer:
> 
> http://docs.racket-lang.org/browser/index.html?q=browser#%28mod-path._browser%29

-- 
You received this message because you are subscribed to the Google Groups 
"Racket Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.


Re: [racket-users] html parsing library does not handle 'article' tags -- any solutions?

2016-01-07 Thread Neil Van Dyke


BTW, people should *not* get `html-parsing` from the new package 
system.  It has an old version that someone else put there unofficially, 
and it's missing a significant change.  I'm still maintaining the 
official version of `html-parsing` in PLaneT (until I get time to change 
my doc tools):


(require (planet neil/html-parsing:3:0))

http://www.neilvandyke.org/racket-html-parsing/

Neil V.

--
You received this message because you are subscribed to the Google Groups "Racket 
Users" group.
To unsubscribe from this group and stop receiving emails from it, send an email 
to racket-users+unsubscr...@googlegroups.com.
For more options, visit https://groups.google.com/d/optout.