On 20/11/06, Paul Novitski <[EMAIL PROTECTED]> wrote:
Børge, Here's how I would think this one through: First, I'm having to make several guesses at the nature of your text content: - You use the single word "topic" but I'll assume this can be multiple words and spaces. - Your source string includes a space after "rest of the text " while your marked-up result doesn't. However I will assume that you really do mean the rest of the text until end-of-string. - Your source string also includes a space before the initial <c> but your regexp pattern doesn't. I'll assume that both beginning and ending spaces are unintentional. Your source string: "<c> FFFFFF topic <c> 999999 rest of the text" consists of these parts: 1) [start-of-string] 2) "<c> " 3) "FFFFFF" (color code 1) 4) " " 5) "topic" (text 1) 6) " <c> " 7) "999999" (color code 2) 8) " " 9) "rest of the text" (text 2) 10) [end-of-string] i.e.: 1) [start-of-string] 2) <c> + whitespace 3) color code 1 4) whitespace 5) one or more characters 6) whitespace + <c> + whitespace 7) color code 2 8) whitespace 9) one or more characters 10) [end-of-string] This suggests the regexp pattern: 1) ^ 2) <c>\s 3) ([0-9A-F]{6}) 4) \s 5) (.+) 6) \s<c>\s 7) ([0-9A-F]{6}) 8) \s 9) (.+) 10) $ /^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i Everything in the source string that you need to retain needs to be in parentheses so regexp can grab it. In 5) I can let the pattern be greedy, safe in the knowledge that there WILL be a /s<c> to terminate the character-grab. I end with the pattern modifier /i so it will work with lowercase letters in the RGB color codes. PHP: $sText = '<c> FFFFFF topic <c> 999999 rest of the text'; $sPattern = '/^<c>\s([0-9A-F]{6})\s(.+)\s<c>\s([0-9A-F]{6})\s(.+)$/i'; preg_match($sPattern, $sText, $aMatches); print_r($aMatches); result: Array ( [0] => <c> FFFFFF topic <c> 999999 rest of the text [1] => FFFFFF [2] => topic [3] => 999999 [4] => rest of the text ) This isolates the four substrings you want in regexp references $1 through $4. Replacement: [Tangentially, I'd like to comment that font tags are passe. I urge you to use spans with styling instead. I normally dislike using inline styles (style details mixed with the HTML), but in this case (as far as I know) you don't have any choice. If you can, I suggest you replace the literal color codes with style names and define the precise colors in your stylesheet, not your database. [What this further suggests is that you ought to have two discrete database fields, `topic` and `description`, if you can, rather than combining them into one field that needs to be parsed. Then you can output something like: <span class="topic">TOPIC</span> <span class="desc">DESCRIPTION</span> and leave the RGB color codes out of this layer of your application altogether.] However, working with the data you've been dealt: $sTagBegin = '<span style="color:#'; $sTagEnd = ';">'; $sCloseTag = '</span>'; $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag . $sTagBegin . '$3' . $sTagEnd . '$4' . $sCloseTag; echo preg_replace($sPattern, $sReplacement, $sText); result: <span style="color:#FFFFFF;">topic</span> <span style="color:#999999;">rest of the text</span> ____________________________ It's tempting to write the pattern more succinctly to take advantage of the repeating pattern of the source text: <c> COLORCODE text The regexp pattern might be: 1) \s* 2) <c>\s 3) ([0-9A-F]{6}) 4) \s 5) ([^<]+) 1) optional whitespace 2) <c> + whitespace 3) color code 4) whitespace 5) one or more characters until the next < $sText = '<c> FFFFFF topic <c> 999999 rest of the text'; $sPattern = '/\s*<c>\s([0-9A-F]{6})\s([^<]+)/i'; preg_match_all($sPattern, $sText, $aMatches); result: Array ( [0] => Array ( [0] => FFFFFF topic [1] => 999999 rest of the text ) [1] => Array ( [0] => FFFFFF [1] => 999999 ) [2] => Array ( [0] => topic [1] => rest of the text ) ) In this case, we need to specify the tag pattern only once: $sReplacement = $sTagBegin . '$1' . $sTagEnd . '$2' . $sCloseTag; echo preg_replace($sPattern, $sReplacement, $sText); result: <span style="color:#FF0000;">topic </span> <span style="color:#00FF00;">rest of the text</span> Notice is that this results in whitespace after the topic string. Someone more knowledgeable in regular expressions can probably tell you how to eliminate that, perhaps by using a regexp assertion: http://php.net/manual/en/reference.pcre.pattern.syntax.php#regexp.reference.assertions Regards, Paul __________________________ Paul Novitski Juniper Webcraft Ltd. http://juniperwebcraft.com
Paul, I just got around to reading this thread. The post of yours that I quote above has got to be one of the best posts that I've read in the 5 years that I've been on and off the php list. The way you break that regex down taught me things that have eluded me for half a decade. Although I have nothing to do with the OP, I really want to say thanks for that bit of information. Dotan Cohen http://lyricslist.com/ http://what-is-what.com/