I've come up against an annoying bug.

Scenario:
=========

The id3 tag of a file contains non-ascii characters (above
127). Typically, this is character 180 (used as an apostrophe).

The AllowedChar mask is set to "A-Za-z0-9". 

When running CleanUp.pl, the character 180 is replaced with 194 followed
180.

Subsequent runs of CleanUp.pl tend to go even further ga ga.

Analysis:
=========

After spending sometime trying to come up with a simple test scenario, I'm
am deeply suspicious that this is a sympton of an underlying bug in perl.

For example:

In cleanup.pl:93

sub MakeLegalName
{
   my ($str) = @_;
   my ($dest, $i);

   for($i = 0; $i < length($str); $i++)
   {
      if (chr(vec($str, $i, 8)) =~ m/[$AllowedChars]+/)
      {
         $dest .= chr(vec($str, $i, 8));
      }
   }

   return $dest;
}

I discovered that adding the line:
   my $ac=vec($AllowedChars,1,8);

at the top of the function appears to make the problem go away. This makes
no sense to me whatsoever. I'm guessing, that the behaviour of m// depends
on whether perl regards the pattern and/or expression as unicode or ascii,
and somehow the 'vec' function changes it's mind.

I don't however regard this as a sensible way of fixing the problem.

So my questions are:

1) Is this only a bug in perl v5.6.1? Have other people seen this
problem? Is the bug in the perl core, or in one of the modules we use
(i.e. xml parser, database, or id3)?

2) Why is this function written in this obscure way anyway?

Why not something like: 
my ($str)=@_;
$str=s/[$AllowedChars]+//g;
return $str;

3) Would anybody (reasonbly) need to have non-ascii characters in their
filenames (e.g accents, cyrillic, greek, etc.)?

4) Anything else? I personally, like to lowercase, and map non allowed
chars to '_'. Multiple '_' to single '_'.

Rob
-- 
  ______  _____  ______  _______  ______ _______
 |_____/ |     | |_____] |______ |_____/    |   
 |    \_ |_____| |_____] |______ |    \_    |   
                                                
 _     _ _______  ______ _______
 |_____| |_____| |_____/    |   
 |     | |     | |    \_    |   

_______________________________________________
Obs-dev mailing list
[EMAIL PROTECTED]
http://www.freeamp.org/mailman/listinfo/obs-dev

Reply via email to