On 20/11/06, Paul Novitski <[EMAIL PROTECTED]> wrote:

Børge,

Here's how I would think this one through:

First, I'm having to make several guesses at the nature of your text content:

- You use the single word "topic" but I'll assume
this can be multiple words and spaces.

- Your source string includes a space after "rest
of the text " while your marked-up result
doesn't.  However I will assume that you really
do mean the rest of the text until end-of-string.

- Your source string also includes a space before
the initial <c> but your regexp pattern
doesn't.  I'll assume that both beginning and ending spaces are unintentional.


Your source string:

         "<c> FFFFFF topic <c> 999999 rest of the text"

consists of these parts:

1) [start-of-string]
2) "<c> "
3) "FFFFFF"     (color code 1)
4) " "
5) "topic"      (text 1)
6) " <c> "
7) "999999"     (color code 2)
8) " "
9) "rest of the text"   (text 2)
10) [end-of-string]

i.e.:

1) [start-of-string]
2) <c> + whitespace
3) color code 1
4) whitespace
5) one or more characters
6) whitespace + <c> + whitespace
7) color code 2
8) whitespace
9) one or more characters
10) [end-of-string]

This suggests the regexp pattern:

1) ^
2) <c>\s
3) ([0-9A-F]{6})
4) \s
5) (.+)
6) \s<c>\s
7) ([0-9A-F]{6})
8) \s
9) (.+)
10) $

/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i

Everything in the source string that you need to
retain needs to be in parentheses so regexp can grab it.

In 5) I can let the pattern be greedy, safe in
the knowledge that there WILL be a /s<c> to terminate the character-grab.

I end with the pattern modifier /i so it will
work with lowercase letters in the RGB color codes.

PHP:

$sText = '<c> FFFFFF topic <c> 999999 rest of the text';
$sPattern = '/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i';
preg_match($sPattern, $sText, $aMatches);
print_r($aMatches);

result:

Array
(
     [0] => <c> FFFFFF topic <c> 999999 rest of the text
     [1] => FFFFFF
     [2] => topic
     [3] => 999999
     [4] => rest of the text
)

This isolates the four substrings you want in regexp references $1 through $4.

Replacement:

[Tangentially, I'd like to comment that font tags
are passe.  I urge you to use spans with styling
instead.  I normally dislike using inline styles
(style details mixed with the HTML), but in this
case (as far as I know) you don't have any
choice.  If you can, I suggest you replace the
literal color codes with style names and define
the precise colors in your stylesheet, not your database.

[What this further suggests is that you ought to
have two discrete database fields, `topic` and
`description`, if you can, rather than combining
them into one field that needs to be
parsed.  Then you can output something like:

         <span class="topic">TOPIC</span> <span class="desc">DESCRIPTION</span>

and leave the RGB color codes out of this layer
of your application altogether.]


However, working with the data you've been dealt:

$sTagBegin = '<span style="color:#';
$sTagEnd = ';">';
$sCloseTag = '</span>';

$sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag .
                 $sTagBegin . '$3' . $sTagEnd . '$4' . $sCloseTag;

echo preg_replace($sPattern, $sReplacement, $sText);

result:

<span style="color:#FFFFFF;">topic</span> <span
style="color:#999999;">rest of the text</span>

____________________________

It's tempting to write the pattern more
succinctly to take advantage of the repeating pattern of the source text:

         <c> COLORCODE text

The regexp pattern might be:

1) \s*
2) <c>\s
3) ([0-9A-F]{6})
4) \s
5) ([^<]+)

1) optional whitespace
2) <c> + whitespace
3) color code
4) whitespace
5) one or more characters until the next <

$sText = '<c> FFFFFF topic <c> 999999 rest of the text';

$sPattern = '/\s*<c>\s([0-9A-F]{6})\s([^<]+)/i';

preg_match_all($sPattern, $sText, $aMatches);

result:

Array
(
     [0] => Array
         (
             [0] =>  FFFFFF topic
             [1] =>  999999 rest of the text
         )

     [1] => Array
         (
             [0] => FFFFFF
             [1] => 999999
         )

     [2] => Array
         (
             [0] => topic
             [1] => rest of the text
         )

)

In this case, we need to specify the tag pattern only once:

$sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag;

echo preg_replace($sPattern, $sReplacement, $sText);

result:

<span style="color:#FF0000;">topic </span> <span
style="color:#00FF00;">rest of the text</span>

Notice is that this results in whitespace after
the topic string.  Someone more knowledgeable in
regular expressions can probably tell you how to
eliminate that, perhaps by using a regexp assertion:
http://php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference.assertions

Regards,
Paul
__________________________

Paul Novitski
Juniper Webcraft Ltd.
http://juniperwebcraft.com


Paul, I just got around to reading this thread. The post of yours that
I quote above has got to be one of the best posts that I've read in
the 5 years that I've been on and off the php list. The way you break
that regex down taught me things that have eluded me for half a
decade. Although I have nothing to do with the OP, I really want to
say thanks for that bit of information.

Dotan Cohen

http://lyricslist.com/
http://what-is-what.com/

Reply via email to