Re: [Jprogramming] Regex crashes in J6.01c

Dan Bron Tue, 26 Dec 2006 09:06:06 -0800

I should also point out:

           stringhashtml 'embedded <html>...</html> html'
        1


with either definition of  stringhashtml  (mine was written to emulate yours).

I cite all these examples only so you're aware of what you're matchingagainst. If the string must start with <html> and end with </html>,you'd have to write:


           (?i)^<html>[^\0]*</html>$

instead (and it could still have [incorrectly] nested <html> tags).

Also, I don't see why you bother matching against the closing tag,because you don't use capturing parens (back references). Do you wantto ensure "well formed" HTML? If so, do you parse the string later?

Or is it perhaps that you want to ensure that the HTML tags encloseSOMETHING, even if it's only a single character? If so, you'd have toreplace the '*' with a '+', i.e.:


           (?i)<html>[^\0]+</html>

(I noted the '*' in your original expression, but I also saw an emptycharacter class '[]', which I didn't understand, but thought might bean attempt at "match something".)

If you don't care about the closing tag, and want only to ensure theopening tag is not followed by nulls, you can make the expression evensimpler (faster):


           (?i)<html>[^\0]*

(if it has to match at the beginning of the string, add the  ^  as above).

One further, but important note: apparently the regex library treatsinput strings as null-terminated. This is either a bug in PCRE or J'sinterface to it. So, expressions that guard against nulls are doomed.To wit:


           stringhashtml   'A'  ,  '<html>foo</html>'
        1
           stringhashtml   'A'  ,~ '<html>foo</html>'
        1
           stringhashtml ({.a.) ,  '<html>foo</html>'
        0
           stringhashtml ({.a.) ,~ '<html>foo</html>'
        1

I only thought to check this because I ran into this bug years ago.Kirk Iverson fixed it in my local installation, but I have no ideawhat it was, and I no longer have access to it (besides, it was beforethe switch to the PCRE library).


-Dan

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Re: [Jprogramming] Regex crashes in J6.01c

Reply via email to