I should also point out:

           stringhashtml 'embedded <html>...</html> html'
        1

with either definition of  stringhashtml  (mine was written to emulate yours).

I cite all these examples only so you're aware of what you're matching against. If the string must start with <html> and end with </html>, you'd have to write:

           (?i)^<html>[^\0]*</html>$

instead (and it could still have [incorrectly] nested <html> tags).

Also, I don't see why you bother matching against the closing tag, because you don't use capturing parens (back references). Do you want to ensure "well formed" HTML? If so, do you parse the string later?

Or is it perhaps that you want to ensure that the HTML tags enclose SOMETHING, even if it's only a single character? If so, you'd have to replace the '*' with a '+', i.e.:

           (?i)<html>[^\0]+</html>

(I noted the '*' in your original expression, but I also saw an empty character class '[]', which I didn't understand, but thought might be an attempt at "match something".)

If you don't care about the closing tag, and want only to ensure the opening tag is not followed by nulls, you can make the expression even simpler (faster):

           (?i)<html>[^\0]*

(if it has to match at the beginning of the string, add the  ^  as above).

One further, but important note: apparently the regex library treats input strings as null-terminated. This is either a bug in PCRE or J's interface to it. So, expressions that guard against nulls are doomed. To wit:

           stringhashtml   'A'  ,  '<html>foo</html>'
        1
           stringhashtml   'A'  ,~ '<html>foo</html>'
        1
           stringhashtml ({.a.) ,  '<html>foo</html>'
        0
           stringhashtml ({.a.) ,~ '<html>foo</html>'
        1

I only thought to check this because I ran into this bug years ago. Kirk Iverson fixed it in my local installation, but I have no idea what it was, and I no longer have access to it (besides, it was before the switch to the PCRE library).

-Dan

----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm

Reply via email to