ID:               39332
 Comment by:       sebastianstenzel at googlemail dot com
 Reported By:      herbert dot fischer at gmail dot com
 Status:           No Feedback
 Bug Type:         Program Execution
 Operating System: Red Hat ELAS4 Upd3
 PHP Version:      5.1.6
 New Comment:

I solved the problem by exporting the variable LANG=en_US.utf8 (or some
other charset).
I did it in each shell command in my php script, but probably it can be
done in the shell settings of the user, who is the owner of the php file
(e.g. www-data).

Example:
shell_exec("LANG=en_US.utf8; svn list file:///path/to/repos");

German chars like ä, ö, ü or ß (which used to be some ?\xyz code) are
correct now.


Previous Comments:
------------------------------------------------------------------------

[2006-11-16 01:00:01] php-bugs at lists dot php dot net

No feedback was provided for this bug for over a week, so it is
being suspended automatically. If you are able to provide the
information that was originally requested, please do so and change
the status of the bug back to "Open".

------------------------------------------------------------------------

[2006-11-08 14:14:31] [EMAIL PROTECTED]

Thank you for this bug report. To properly diagnose the problem, we
need a short but complete example script to be able to reproduce
this bug ourselves. 

A proper reproducing script starts with <?php and ends with ?>,
is max. 10-20 lines long and does not require any external 
resources such as databases, etc. If the script requires a 
database to demonstrate the issue, please make sure it creates 
all necessary tables, stored procedures etc.

Please avoid embedding huge scripts into the report.



------------------------------------------------------------------------

[2006-11-01 13:55:38] herbert dot fischer at gmail dot com

Description:
------------
PHP is assuming character encoding from external executed svn client,
as ASCII.

Even when external program returns ISO-8859-1 encoded string, PHP
"parses" the encoded string as ASCII, expanding accented characters as
literal string form and not their binary form.

For example: 
an output like "Acentuação" turns to be a string in literal form
"Acentua?\195?\167?\195?\163o/".

Reproduce code:
---------------
Import some accented file or folders into a subversion repository. Is
it possible to convert the output to utf-8 using the command bellow:

# svn list 'file:////home/svn/herbert/' | iconv -tutf-8

But not when from PHP:

<?php
$cmd = "svn list 'file:////home/svn/herbert/'";
$out = shell_exec($cmd);
$res = unpack('c*', $out);
var_dump($res);
?>

var_dump reports:

array(29) {
  [1]=>
  int(65)
  [2]=>
  int(99)
  [3]=>
  int(101)
  [4]=>
  int(110)
  [5]=>
  int(116)
  [6]=>
  int(117)
  [7]=>
  int(97)
  [8]=>
  int(63)
  [9]=>
  int(92)
  [10]=>
  int(49)
  [11]=>
  int(57)
  [12]=>
  int(53)
  [13]=>
  int(63)
  [14]=>
  int(92)
  [15]=>
  int(49)
  [16]=>
  int(54)
  [17]=>
  int(55)
  [18]=>
  int(63)
  [19]=>
  int(92)
  [20]=>
  int(49)
  [21]=>
  int(57)
  [22]=>
  int(53)
  [23]=>
  int(63)
  [24]=>
  int(92)
  [25]=>
  int(49)
  [26]=>
  int(54)
  [27]=>
  int(51)
  [28]=>
  int(111)
  [29]=>
  int(47)
}

So it's not possible to convert the string to other character set,
since it's invalid.

Expected result:
----------------
It's expected to PHP store the string as it's original binary format.

array(10) {
  [1]=>
  int(65)
  [2]=>
  int(99)
  [3]=>
  int(101)
  [4]=>
  int(110)
  [5]=>
  int(116)
  [6]=>
  int(117)
  [7]=>
  int(97)
  [8]=>
  int(-25)
  [9]=>
  int(-29)
  [10]=>
  int(111)
}



------------------------------------------------------------------------


-- 
Edit this bug report at http://bugs.php.net/?id=39332&edit=1

Reply via email to