Wed Dec 28 06:49:51 2011: Request 73491 was acted upon.
Transaction: Correspondence added by RSCHUPP
       Queue: PAR-Packer
     Subject: cache directory naming problem
   Broken in: 1.012
    Severity: (no value)
       Owner: RSCHUPP
  Requestors: gyp...@gmail.com
      Status: open
 Ticket <URL: https://rt.cpan.org/Ticket/Display.html?id=73491 >


On 2011-12-27 01:31:33, gyp...@gmail.com wrote:
> I looked into the error message and I found that the original username 
> is changed to be an illegal byte sequence:
> * from 0x ba ce be fb c0 cc (3character, 6bytes, encoded with cp949)
> *   to 0x ba ce be 5f c0 cc
...
> or, equivalent C-code in mktmpdir.c:
> 
>         /* replace all non-alphanumeric letters with '_' */
>         for ( c = username ; *c != '\0' ; c++ ) {
>             if ( !isalnum(*c) ) {
>                 *c = '_';
>             }
>         }

Excellent analysis.

There's actually a problem with the C version: the type of c is char*
and char is signed on Intel i386. At least the Mingw32 implementation
of isalnum() doesn't correctly account for that and marks
only one of the 6 bytes above as !isalnum. It should've marked
them all, resulting in "______". That wouldn't have caused an error
when creating the cache directory, but has a high probability for 
collision with another username (though that wouldn't have
shown in a default Windows environment where the temp directory
is per-user anyway).

> Would you please check it? I think that it would be good idea to use "%-
> encoding" for all non-latin characters, so that the username in the 
> screenshot would be changed into "par-%ba%ce%be%fb%c0%cc". ('%' may be 
> removed)

Yeah, but what is a "non-latin" character if we don't consider
the charset? I think we shouldn't make any assumption about it
and simply encode _all_ bytes unconditionally in username
as two hex bytes. The only thing we loose is easy recognizability
of which cache directory belongs to which user.

>From cursory looking at CP949 (or EUC-KR) I believe that the
sequence of bytes "par-bacebefbc0cc" is a legal string in CP949, right?
That should also work with ASCII and all ISO Latin encodings, 
as well as EUC-CN and EUC-JN and UTF-8.

Could you please try the two attached patches (the first is for PAR,
the second for PAR::Packer)?

Cheers, Roderich
Index: PAR/lib/PAR/SetupTemp.pm
===================================================================
--- PAR/lib/PAR/SetupTemp.pm	(revision 1336)
+++ PAR/lib/PAR/SetupTemp.pm	(working copy)
@@ -98,7 +98,11 @@
       qw( C:\\TEMP /tmp . )
   ) {
     next unless defined $path and -d $path and -w $path;
-    $temp_path = File::Spec->catdir($path, "par-$username");
+    # create a temp directory that is unique per user
+    # NOTE: $username may be in an unspecified charset/encoding;
+    # use a name that hopefully works for all of them;
+    # also avoid problems with platform-specific meta characters in the name
+    $temp_path = File::Spec->catdir($path, "par-".unpack("H*", $username));
     ($temp_path) = $temp_path =~ /^(.*)$/s;
     unless (mkdir($temp_path, 0700) || $!{EEXIST}) {
       warn "creation of private subdirectory $temp_path failed (errno=$!)"; 
@@ -140,7 +144,6 @@
   else {
     $username = $ENV{USERNAME} || $ENV{USER} || 'SYSTEM';
   }
-  $username =~ s/\W/_/g;
 
   return $username;
 }
Index: PAR-Packer/myldr/mktmpdir.c
===================================================================
--- PAR-Packer/myldr/mktmpdir.c	(revision 1336)
+++ PAR-Packer/myldr/mktmpdir.c	(working copy)
@@ -76,7 +76,6 @@
 
 char *par_mktmpdir ( char **argv ) {
     int i;
-    char *c;
     const char *tmpdir = NULL;
     const char *key = NULL , *val = NULL;
 
@@ -112,6 +111,7 @@
         DWORD buflen = MAXPATHLEN;
         username = malloc(MAXPATHLEN);
         GetUserName((LPTSTR)username, &buflen);
+        // FIXME this is uncondifionally overwritten below - WTF?
     }
 #endif
 
@@ -123,18 +123,17 @@
                 username = strdup(val);
         }
     }
-
-    if ( username == NULL ) {
+    if ( username == NULL )
         username = "SYSTEM";
+   
+    /* sanitize username: encode all bytes as 2 hex digits */
+    {
+        char *hexname = malloc(2 * strlen(username) + 1);
+        char *u, *h;
+        for ( u = username, h = hexname ; *u != '\0' ; u++, h += 2)
+            sprintf(h, "%02x", *(unsigned char*)u);
+        username = hexname;
     }
-    else {
-        /* replace all non-alphanumeric letters with '_' */
-        for ( c = username ; *c != '\0' ; c++ ) {
-            if ( !isalnum(*c) ) {
-                *c = '_';
-            }
-        }
-    }
 
     /* Try temp environment variables */
     for ( i = 0 ; tmpdir == NULL && (key = temp_keys[i]); i++ ) {

Reply via email to