At 7:24 pm +0000 23/12/05, John Delacour wrote:

Others may have some magic solution but to me it seems you have to convert the escaped original text to the utf-8 bytes it intends, convert these to UTF-16BE and then produce one file with the contents in precomposed form and another in decomposed form. Which of these you use will depend on the normalisation used in the file system.

Here's another way, which is still probably overkill. This script takes your 'R\xC3\xA9union', creates a file in the home directory named "Réunion-junk.txt" using the converted string in the decomposed form and then checks for matches.


#!/usr/bin/perl
no warnings 'utf8';
use Unicode::Normalize;
$dir = $ENV{HOME};
$ascii = 'R\xC3\xA9union';
$utf8 = pack "U0C*", unpack "C*", eval qq("$ascii");
$decomposed = pack "U0C*", unpack "C*", NFD($utf8);
$precomposed = pack "U0C*", unpack "C*", NFC($utf8);
open F, ">$dir/$decomposed-junk.txt" or die $!;
opendir DIR, $dir or die $!;
for (readdir DIR) {
  $_ = pack "U0C*", unpack "C*", $_;
  print "Found: $_$/" if /$decomposed/;
}


JD

Reply via email to