I should also point out:
stringhashtml 'embedded <html>...</html> html'
1
with either definition of stringhashtml (mine was written to emulate yours).
I cite all these examples only so you're aware of what you're matching
against. If the string must start with <html> and end with </html>,
you'd have to write:
(?i)^<html>[^\0]*</html>$
instead (and it could still have [incorrectly] nested <html> tags).
Also, I don't see why you bother matching against the closing tag,
because you don't use capturing parens (back references). Do you want
to ensure "well formed" HTML? If so, do you parse the string later?
Or is it perhaps that you want to ensure that the HTML tags enclose
SOMETHING, even if it's only a single character? If so, you'd have to
replace the '*' with a '+', i.e.:
(?i)<html>[^\0]+</html>
(I noted the '*' in your original expression, but I also saw an empty
character class '[]', which I didn't understand, but thought might be
an attempt at "match something".)
If you don't care about the closing tag, and want only to ensure the
opening tag is not followed by nulls, you can make the expression even
simpler (faster):
(?i)<html>[^\0]*
(if it has to match at the beginning of the string, add the ^ as above).
One further, but important note: apparently the regex library treats
input strings as null-terminated. This is either a bug in PCRE or J's
interface to it. So, expressions that guard against nulls are doomed.
To wit:
stringhashtml 'A' , '<html>foo</html>'
1
stringhashtml 'A' ,~ '<html>foo</html>'
1
stringhashtml ({.a.) , '<html>foo</html>'
0
stringhashtml ({.a.) ,~ '<html>foo</html>'
1
I only thought to check this because I ran into this bug years ago.
Kirk Iverson fixed it in my local installation, but I have no idea
what it was, and I no longer have access to it (besides, it was before
the switch to the PCRE library).
-Dan
----------------------------------------------------------------------
For information about J forums see http://www.jsoftware.com/forums.htm