yep intead very cool

will try it

nicolas

On 14/08/2015 11:40, Tudor Girba wrote:
Hi,

You can also consider using island parsing, this very cool addition to PetitParser developed by Jan:

beginScript := '<script>' asParser.
endScript := '</script>' asParser.
script := beginScript , endScript negate star flatten , endScript ==> #second.
islandScripts := (script island ==> #second) star.

If you apply it on:

code := 'uninteresting part
<script>
some code
</script>
another
uninteresting part
<script>
some other
code
</script>
yet another
uninteresting part
'.

You get:
islandScripts parse: code
==>  "#('some code' 'some other
code')"

Quite cool, no? :)

Doru


On Fri, Aug 14, 2015 at 1:31 AM, Alexandre Bergel <[email protected] <mailto:[email protected]>> wrote:

    Hi!

    Together with Nicolas we are trying to get all the <script …> …
    </script> from html files.
    We have tried to use XMLDOMParser, but many webpages are actually
    not well formed, therefore the parser is complaining.

    Anyone has tried to get some particular tags from HTML files? This
    looks like a classical thing to do. Maybe some of you have done it.
    Is there a way to configure the parser to accept a broken XML/HTML
    content?

    Cheers,
    Alexandre
    --
    _,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:
    Alexandre Bergel http://www.bergel.eu
    ^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;._,.;:~^~:;.







--
www.tudorgirba.com <http://www.tudorgirba.com>

"Every thing has its own flow"

Reply via email to