Hi,
You can also consider using island parsing, this very cool addition to
PetitParser developed by Jan:
beginScript := '<script>' asParser.
endScript := '</script>' asParser.
script := beginScript , endScript negate star flatten , endScript ==>
#second.
islandScripts := (script island ==> #second) star.
If you apply it on:
code := 'uninteresting part
<script>
some code
</script>
another
uninteresting part
<script>
some other
code
</script>
yet another
uninteresting part
'.
You get:
islandScripts parse: code
==> "#('some code' 'some other
code')"
Quite cool, no? :)
Doru
On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel <[email protected]>
wrote:
> Hi!
>
> Together with Nicolas we are trying to get all the <script …> … </script>
> from html files.
> We have tried to use XMLDOMParser, but many webpages are actually not well
> formed, therefore the parser is complaining.
>
> Anyone has tried to get some particular tags from HTML files? This looks
> like a classical thing to do. Maybe some of you have done it.
> Is there a way to configure the parser to accept a broken XML/HTML content?
>
> Cheers,
> Alexandre
> --
> _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
> Alexandre Bergel http://www.bergel.eu
> ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.
>
>
>
>
>
--
www.tudorgirba.com
"Every thing has its own flow"