Re: [PHP] Regular Expression - highlighting

2004-10-07 Thread Aidan Lister
Hi Michael,

Thanks very much for the assistance, I'll have to investigate further!

Kind Regards,
Aidan Lister


Michael Sims [EMAIL PROTECTED] wrote in message 
news:[EMAIL PROTECTED]
 Aidan Lister wrote:
 Hello list,

 I'm pretty terrible with regular expressions, I was wondering if
 someone would be able to help me with this
 http://paste.phpfi.com/31964

 The problem is detailed in the above link. Basically I need to match
 the contents of any HTML tag, except a link. I'm pretty sure a
 lookbehind set is needed in the center (%s) bit.

 Any suggestions would be appreciated, but it's not quite as simple as
 it sounds - if possible please make sure you run the above script and
 see if it PASSED.

 So basically, you want to put a link around foo, only if it doesn't
 already have one, right?

 The problem with look-behind assertions is that they have to be 
 fixed-width.
 If you're certain of what kind of data you're going to be dealing with 
 then
 this may be sufficient.  For example, I came up with a regex that will 
 PASS
 your script but I doubt seriously that it'll be very useful to you as it
 would be easy to break it by coming up with various test cases.  For your
 single test case, however, this works:

 /(?!a href=foo)(?!a href=)(foo)/

 The problem is that HTML tags can be split across lines...they have have 
 any
 variable amount of whitespace within the tag...they can have other
 attributes (class, id, onClick), etc.  Since look behind assertions have 
 to
 be fixed width it'd be impossible (IMHO) to come up with a single regex 
 that
 would match all cases, unless the input data was uniform.  For example,
 stuff like

 a   href = foo ID=id1 class=redlink
 onClick=javascript:someFunction();foo/a

 and its infinite variants could not be trapped for with a single regex 
 since
 you cannot have an infinite number of fixed width look-behind assertions.
 If quantifying modifiers such as '*', '+', and '?' were allowed in
 look-behind assertions it would be possible, but they aren't (see man
 perlre).

 If your data is coming from unknown sources you'll probably have to use a
 full fledged HTML parser to pull out text that isn't already part of an 
 a
 tag.  I know there are several of these available for perl and I'm sure
 there are for PHP too but I'm unaware of them.

 Sorry if this isn't terribly helpful.  Maybe I'm overlooking something and
 someone else will point out a simple way to accomplish what you're trying 
 to
 do... 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



RE: [PHP] Regular Expression - highlighting

2004-10-03 Thread Michael Sims
Aidan Lister wrote:
 Hello list,

 I'm pretty terrible with regular expressions, I was wondering if
 someone would be able to help me with this
 http://paste.phpfi.com/31964

 The problem is detailed in the above link. Basically I need to match
 the contents of any HTML tag, except a link. I'm pretty sure a
 lookbehind set is needed in the center (%s) bit.

 Any suggestions would be appreciated, but it's not quite as simple as
 it sounds - if possible please make sure you run the above script and
 see if it PASSED.

So basically, you want to put a link around foo, only if it doesn't
already have one, right?

The problem with look-behind assertions is that they have to be fixed-width.
If you're certain of what kind of data you're going to be dealing with then
this may be sufficient.  For example, I came up with a regex that will PASS
your script but I doubt seriously that it'll be very useful to you as it
would be easy to break it by coming up with various test cases.  For your
single test case, however, this works:

/(?!a href=foo)(?!a href=)(foo)/

The problem is that HTML tags can be split across lines...they have have any
variable amount of whitespace within the tag...they can have other
attributes (class, id, onClick), etc.  Since look behind assertions have to
be fixed width it'd be impossible (IMHO) to come up with a single regex that
would match all cases, unless the input data was uniform.  For example,
stuff like

a   href = foo ID=id1 class=redlink
onClick=javascript:someFunction();foo/a

and its infinite variants could not be trapped for with a single regex since
you cannot have an infinite number of fixed width look-behind assertions.
If quantifying modifiers such as '*', '+', and '?' were allowed in
look-behind assertions it would be possible, but they aren't (see man
perlre).

If your data is coming from unknown sources you'll probably have to use a
full fledged HTML parser to pull out text that isn't already part of an a
tag.  I know there are several of these available for perl and I'm sure
there are for PHP too but I'm unaware of them.

Sorry if this isn't terribly helpful.  Maybe I'm overlooking something and
someone else will point out a simple way to accomplish what you're trying to
do...

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php



[PHP] Regular Expression - highlighting

2004-10-02 Thread Aidan Lister
Hello list,

I'm pretty terrible with regular expressions, I was wondering if someone 
would be able to help me with this
http://paste.phpfi.com/31964

The problem is detailed in the above link. Basically I need to match the 
contents of any HTML tag, except a link. I'm pretty sure a lookbehind set is 
needed in the center (%s) bit.

Any suggestions would be appreciated, but it's not quite as simple as it 
sounds - if possible please make sure you run the above script and see if it 
PASSED.

Here's a little gui to make it easier to test:
http://aidan.dotgeek.org/test_hl.php

Thanks in advance,
Aidan 

-- 
PHP General Mailing List (http://www.php.net/)
To unsubscribe, visit: http://www.php.net/unsub.php