On Tue, Jul 15, 2008 at 9:46 AM, Andrew Ballard <[EMAIL PROTECTED]> wrote:
> On Tue, Jul 15, 2008 at 5:38 AM, Yeti <[EMAIL PROTECTED]> wrote:
>> I dont think using all these regular expressions is a very efficient way to
>> do so. As i previously pointed out there are many users who had a similar
>> problem, which can be viewed at:
>>
>> http://it.php.net/manual/en/function.strtr.php
>>
>> One of my favourites is what derernst at gmx dot ch used.
>>
>> derernst at gmx dot ch
>> wrote on 20-Sep-2005 07:29
>> This works for me to remove accents for some characters of Latin-1, Latin-2
>> and Turkish in a UTF-8 environment, where the htmlentities-based solutions
>> fail:
>>
>>> <?php
>>
>> function remove_accents($string, $german=false) {
>>
>> // Single letters
>>
>> $single_fr = explode(" ", "� � � � � � Ą Ă � Ć Č
>> Ď Đ � � � � � Ę Ě Ğ � � � � İ Ł Ľ
>> Ĺ � Ń Ň � � � � � � Ő Ŕ Ř � Ś Ş
>> Ť Ţ � � � � Ů Ű � � Ź Ż � � � � � � ą
>> ă � ć č ď đ � � � � ę ě ğ � � � �
>> ı ł ľ ĺ � ń ň � � � � � � � ő ŕ
>> ř ś � ş ť ţ � � � � ů ű � � � ź
>> ż");
>>
>> $single_to = explode(" ", "A A A A A A A A C C C D D D E E E E E E G I I I
>> I I L L L N N N O O O O O O O R R S S S T T U U U U U U Y Z Z Z a a a a a a
>> a a c c c d d e e e e e e g i i i i i l l l n n n o o o o o o o o r r s s s
>> t t u u u u u u y y z z z");
>>
>> $single = array();
>>
>> for ($i=0; $i<count($single_fr); $i++) {
>>
>> $single[$single_fr[$i]] = $single_to[$i];
>>
>> }
>>
>> // Ligatures
>>
>> $ligatures = array("�"=>"Ae", "�"=>"ae", "�"=>"Oe", "�"=>"oe", "�"=>"ss");
>>
>> // German umlauts
>>
>> $umlauts = array("�"=>"Ae", "�"=>"ae", "�"=>"Oe", "�"=>"oe", "�"=>"Ue",
>> "�"=>"ue");
>>
>> // Replace
>>
>> $replacements = array_merge($single, $ligatures);
>>
>> if ($german) $replacements = array_merge($replacements, $umlauts);
>>
>> $string = strtr($string, $replacements);
>>
>> return $string;
>>
>> }
>>
>> ?>
>>
>> I would change this function a bit ...
>>
>> <?php
>> //echo rawurlencode("áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙ"); // RFC 1738 codes; NOTE: One
>> might use UTF-8 as this documents encoding
>> function remove_accents($string) {
>> $string = rawurlencode($string);
>> $replacements = array(
>> '%C3%A1' => 'a',
>> '%C3%A0' => 'a',
>> '%C3%A9' => 'e',
>> '%C3%A8' => 'e',
>> '%C3%AD' => 'i',
>> '%C3%AC' => 'i',
>> '%C3%B3' => 'o',
>> '%C3%B2' => 'o',
>> '%C3%BA' => 'u',
>> '%C3%B9' => 'u',
>> '%C3%81' => 'A',
>> '%C3%80' => 'A',
>> '%C3%89' => 'E',
>> '%C3%88' => 'E',
>> '%C3%8D' => 'I',
>> '%C3%8C' => 'I',
>> '%C3%93' => 'O',
>> '%C3%92' => 'O',
>> '%C3%9A' => 'U',
>> '%C3%99' => 'U'
>> );
>> return strtr($string, $replacements);
>> }
>> //echo remove_accents("CÀfé"); // I know it's not spelled right
>> echo remove_accents("áàéèíìóòúùÁÀÉÈÍÌÓÒÚÙ"); //OUTPUT (again: i used UTF-8
>> for document): aaeeiioouuAAEEIIOOUU
>> ?>
>>
>> Ciao
>>
>> Yeti
>>
>> On Mon, Jul 14, 2008 at 8:20 PM, Andrew Ballard <[EMAIL PROTECTED]> wrote:
>>>
>>> On Mon, Jul 14, 2008 at 1:35 PM, Giulio Mastrosanti
>>> <[EMAIL PROTECTED]> wrote:
>>> >>
>>> >
>>> > Brilliant !!!
>>> >
>>> > so you replace every occurence of every accent variation with all the
>>> > accent
>>> > variations...
>>> >
>>> > OK, that's it!
>>> >
>>> > only some more doubts ( regex are still an headhache for me... )
>>> >
>>> > preg_replace('/[iìíîïĩīĭįı]/iu',... -- what's the meaning of iu after
>>> > the
>>> > match string?
>>>
>>> This page explains them both.
>>> http://us.php.net/manual/en/reference.pcre.pattern.modifiers.php
>>>
>>> > preg_replace('/[aàáâãäåǻāăą](?!e)/iu',... whats (?!e) for? -- every
>>> > occurence of aàáâãäåǻāăą NOT followed by e?
>>>
>>> Yes. It matches any character based on the latin 'a' that is not
>>> followed by an 'e'. It keeps the pattern from matching the 'a' when it
>>> immediately precedes an 'e' for the character 'ae' for words like
>>> these:
>>>
>>> http://en.wikipedia.org/wiki/List_of_words_that_may_be_spelled_with_a_ligature
>>> (However, that may cause problems with words that have other variants
>>> of 'ae' in them. I'll leave that to you to resolve.)
>>> http://us.php.net/manual/en/regexp.reference.php
>>>
>>>
>>>
>>> > Many thanks again for your effort,
>>> >
>>> > I'm definitely on the good way
>>> >
>>> > Giulio
>>> >
>>> >
>>> >>
>>> >> I was intrigued by your example, so I played around with it some more
>>> >> this morning. My own quick web search yielded a lot of results for
>>> >> highlighting search terms, but none that I found did what you're
>>> >> after. (I admit I didn't look very deep.) I was up to something like
>>> >> this before your reply came in. It's still by no means complete. It
>>> >> even handles simple English plurals (words ending in 's' or 'es'), but
>>> >> not variations that require changing the word base (like 'daisy' to
>>> >> 'daisies').
>>> >>
>>> >> <?php
>>> >> function highlight_search_terms($phrase, $string) {
>>> >> $non_letter_chars = '/[^\pL]/iu';
>>> >> $words = preg_split($non_letter_chars, $phrase);
>>> >>
>>> >> $search_words = array();
>>> >> foreach ($words as $word) {
>>> >> if (strlen($word) > 2 && !preg_match($non_letter_chars, $word)) {
>>> >> $search_words[] = $word;
>>> >> }
>>> >> }
>>> >>
>>> >> $search_words = array_unique($search_words);
>>> >>
>>> >> foreach ($search_words as $word) {
>>> >> $search = preg_quote($word);
>>> >>
>>> >> /* repeat for each possible accented character */
>>> >> $search = preg_replace('/(ae|æ|ǽ)/iu', '(ae|æ|ǽ)', $search);
>>> >> $search = preg_replace('/(oe|œ)/iu', '(oe|œ)', $search);
>>> >> $search = preg_replace('/[aàáâãäåǻāăą](?!e)/iu',
>>> >> '[aàáâãäåǻāăą]', $search);
>>> >> $search = preg_replace('/[cçćĉċč]/iu', '[cçćĉċč]', $search);
>>> >> $search = preg_replace('/[dďđ]/iu', '[dďđ]', $search);
>>> >> $search = preg_replace('/(?<![ao])[eèéêëēĕėęě]/iu',
>>> >> '[eèéêëēĕėęě]', $search);
>>> >> $search = preg_replace('/[gĝğġģ]/iu', '[gĝğġģ]', $search);
>>> >> $search = preg_replace('/[hĥħ]/iu', '[hĥħ]', $search);
>>> >> $search = preg_replace('/[iìíîïĩīĭįı]/iu', '[iìíîïĩīĭįı]',
>>> >> $search);
>>> >> $search = preg_replace('/[jĵ]/iu', '[jĵ]', $search);
>>> >> $search = preg_replace('/[kķĸ]/iu', '[kķĸ]', $search);
>>> >> $search = preg_replace('/[lĺļľŀł]/iu', '[lĺļľŀł]', $search);
>>> >> $search = preg_replace('/[nñńņňʼnŋ]/iu', '[nñńņňʼnŋ]', $search);
>>> >> $search = preg_replace('/[oòóôõöōŏőǿơ](?!e)/iu',
>>> >> '[oòóôõöōŏőǿơ]', $search);
>>> >> $search = preg_replace('/[rŕŗř]/iu', '[rŕŗř]', $search);
>>> >> $search = preg_replace('/[sśŝşš]/iu', '[sśŝşš]', $search);
>>> >> $search = preg_replace('/[tţťŧ]/iu', '[tţťŧ]', $search);
>>> >> $search = preg_replace('/[uùúûüũūŭůűųǔǖǘǚǜ]/iu',
>>> >> '[uùúûüũūŭůűųǔǖǘǚǜ]', $search);
>>> >> $search = preg_replace('/[wŵ]/iu', '[wŵ]', $search);
>>> >> $search = preg_replace('/[yýÿŷ]/iu', '[yýÿŷ]', $search);
>>> >> $search = preg_replace('/[zźżž]/iu', '[zźżž]', $search);
>>> >>
>>> >>
>>> >> $string = preg_replace('/\b' . $search . '(e?s)?\b/iu', '<span
>>> >> class="keysearch">$0</span>', $string);
>>> >> }
>>> >>
>>> >> return $string;
>>> >>
>>> >> }
>>> >> ?>
>>> >>
>>> >> I still can't help feeling there must be some better way, though.
>>> >>
>>> >>>
>>> >>> well, i think I'm on the good way now, unfortunately I have some other
>>> >>> urgent work and can't try it immediately, but I'll let you know :)
>>> >>>
>>> >>> thank you!
>>> >>>
>>> >>> Giulio
>>> >>
>>> >>
>>> >> Andrew
>>> >>
>>> >>
>>> >
>>> >
>>
>>
>
> I agree it doesn't seem very efficient to me, but I haven't come up
> with anything better. The problem with what you posted is that the OP
> was looking to preserve the accented characters, NOT replace them. All
> he wants to do is wrap some tags around the search terms so that they
> are highlighted. I guess he could use your function to replace all the
> accented characters with regular ones in a copy of the original
> string, and then scan that string using str_pos() or similar against
> the copy to find the index of each occurrence that needs replaced in
> the original string. This seems even less efficient than the regular
> expressions, to me.
>
> Andrew
>
Well, OK, I can think of one optimization. This takes advantage of the
fact that preg_replace can accept arrays as parameters. In a couple
very quick tests this version is roughly 30% faster than my previous
version:
<?php
function highlight_search_terms2($phrase, $string) {
$non_letter_chars = '/[^\pL]/iu';
$words = preg_split($non_letter_chars, $phrase);
$search_words = array();
foreach ($words as $word) {
if (strlen($word) > 2 && !preg_match($non_letter_chars, $word)) {
$search_words[] = $word;
}
}
$search_words = array_unique($search_words);
$patterns = array(
/* repeat for each possible accented character */
'/(ae|æ|ǽ)/iu' => '(ae|æ|ǽ)',
'/(oe|œ)/iu' => '(oe|œ)',
'/[aàáâãäåǻāăą](?!e)/iu' => '[aàáâãäåǻāăą]',
'/[cçćĉċč]/iu' => '[cçćĉċč]',
'/[dďđ]/iu' => '[dďđ]',
'/(?<![ao])[eèéêëēĕėęě]/iu' => '[eèéêëēĕėęě]',
'/[gĝğġģ]/iu' => '[gĝğġģ]',
'/[hĥħ]/iu' => '[hĥħ]',
'/[iìíîïĩīĭįı]/iu' => '[iìíîïĩīĭįı]',
'/[jĵ]/iu' => '[jĵ]',
'/[kķĸ]/iu' => '[kķĸ]',
'/[lĺļľŀł]/iu' => '[lĺļľŀł]',
'/[nñńņňʼnŋ]/iu' => '[nñńņňʼnŋ]',
'/[oòóôõöōŏőǿơ](?!e)/iu' => '[oòóôõöōŏőǿơ]',
'/[rŕŗř]/iu' => '[rŕŗř]',
'/[sśŝşš]/iu' => '[sśŝşš]',
'/[tţťŧ]/iu' => '[tţťŧ]',
'/[uùúûüũūŭůűųǔǖǘǚǜ]/iu' => '[uùúûüũūŭůűųǔǖǘǚǜ]',
'/[wŵ]/iu' => '[wŵ]',
'/[yýÿŷ]/iu' => '[yýÿŷ]',
'/[zźżž]/iu' => '[zźżž]',
);
foreach ($search_words as $word) {
$search = preg_quote($word);
$search = preg_replace(array_keys($patterns), $patterns, $search);
$string = preg_replace('/\b' . $search . '(e?s)?\b/iu', '<span
class="keysearch">$0</span>', $string);
}
return $string;
}
?>
Andrew