I thought the bug was in parse, not remove because I
tested this without the remove, just checking how
'parse iterates over the text string
After looking at your example I'm quite confused, I
think more people have to see this before it's a bug.
We need to be missing something otherwise this would
be a significant bug !
--- Anton <[EMAIL PROTECTED]> escribi�: > Jose,
>
> Well done, you have discovered a bug in 'parse,
> I think. (It could also be 'remove ?).
> The following script shows the problem.
> Note that html and html2 are different by one
> character,
> the 'x' (although it doesn't seem to matter which
> character
> it is, just the length of the string.)
>
>
> html: {<script
> ------------------></script><script>I should be
> removed</script>}
> html2: {<script
> -----------x-------></script><script>I should be
> removed</script>}
>
> html rule: [
> any [
> (print "~~~ any block ~~~")
> to "<script" mark1: (?? mark1)
> thru "/script>" mark2: (
> ?? mark2
> remove/part mark1 mark2
> ?? mark1
> )
> :mark1
> (?? mark1)
> ] to end
> ]
>
> parse/all html rule
> prin "^/"
> parse/all html2 rule
> prin "^/"
>
> ?? html
> ?? html2
>
> halt
>
>
> I would like to analyse this further before making a
> bug report to feedback. Better to have more
> information.
> Anybody have any comments about this?
>
> Anton.
>
>
> > I've checked the HTML manually and the sequence of
> > tags is
> >
> > proper set of
> >
> > 1. <script ... </script>
> >
> > and then an orphan (unnoticed by browsers)
> >
> > 2. </script>
> >
> > and finally
> >
> > 3. <script ... </script>
> >
> > The parsing stops just before the orphan
> </script>,
> > which I don't understand . The rule should go
> beyond 2
> > !
> >
> > You can check the real html at http://www.abc.es
> >
> > Thanks
> >
> > --- Anton <[EMAIL PROTECTED]> escribi�: > Jose,
> > >
> > > Your parse rule looks fine to me.
> > > I tested out your parse rule with long
> > > strings of matching <script></script> pairs,
> > > but I didn't see any problems.
> > >
> > > I would ask you to look at your input
> > > more carefully. Maybe there is something in
> > > there that tricks this rule.
> > >
> > > Do this:
> > > - Save a copy of your input.
> > > - Cut selected pieces out of your input so that
> it
> > > still
> > > breaks your rule. Save each time.
> > > - When you can't cut any more out, look at what
> you
> > > have left, and if you can't figure it out, post
> the
> > > input
> > > here and we can have a look.
> > >
> > > Anton.
> > >
> > > > I use the following parse code to remove
> scripting
> > > > from the html before I do other parsing. This
> > > seems to
> > > > work fine for all pages, but I just found a
> page
> > > with
> > > > lots of script tags and it only removes the
> first
> > > 86
> > > > and leaves the last one.
> > > >
> > > > What am I doing wrong ?
> > > >
> > > > Thanks
> > > > Jose
> > > >
> -----------------------------------------------
> > > > parse/all html [ any [
> > > > to "<script" mark1:
> > > > thru "/script>" mark2:
> > > > (remove/part mark1
> mark2)
> > > > :mark1
> > > > ] to end
> > > > ]
>
> --
> To unsubscribe from this list, please send an email
> to
> [EMAIL PROTECTED] with "unsubscribe" in the
> subject, without the quotes.
>
_______________________________________________________________
Copa del Mundo de la FIFA 2002
El �nico lugar de Internet con v�deos de los 64 partidos.
�Ap�ntante ya! en http://fifaworldcup.yahoo.com/fc/es/
--
To unsubscribe from this list, please send an email to
[EMAIL PROTECTED] with "unsubscribe" in the
subject, without the quotes.