Hi,
On 01/05/2013 16:49, steveco.1...@gmail.com wrote:
Well all this just serves to deepen my confusion.
1) What is the difference between:
$line = decode( 'UTF-8', $orig );
and
$line = decode( 'utf8', $orig );
Always use
decode( 'UTF-8', $orig );
'UTF-8' means what it says.
In my opinion, 'utf8' means "something really quite like utf8 in all but
a few respects but which isn't UTF-8 and is a left over from the dog's
breakfast of Perl Unicode string handling, encoding and source code
handling that took a decade to fix."
Perl's documents currently refer to UTF-8 as 'strict UTF-8'. There's no
sanity to it. Why the docs don't just say "'utf8' is really a left over
from an era of big mistakes", I don't know.
Why did the latter not work for Octavian?
It isn't the difference between 'utf8' and 'UTF-8' that caused
Octavian's code to fail.
2) Mark, your earlier logic seemed clear and unassailable, yet now you seem
to change your mind.
I got worn down. It is, after all a community project. The logic seemed
clear and unassailable to me too. When faced with an argument that
simply ignores everything you say you are left with the option of
repeating yourself for ever, ignoring the opposite argument, or giving
up and agreeing. Life is short so I gave up and agreed. I always try to
take the approach that even if the other fellow is wrong in principle,
what exactly would be the downside to agreeing. It leaves you with the
time and energy available to go on repeating yourself forever on the
important stuff.
It won't break much I don't think.
I use: $line = decode( 'utf8', $orig );
and I never have a problem, but according to this logic that is luck.
I accept this and I am happy to use utf8::upgrade($string);
I think we should assume that in the general case there will always be some
Perl processing before wxWidgets sees the string.
The general case is:
1 - Retrieve data from file or database (this maybe automatically decoded or
not, depending on the database and the driver);
2 - Do something to it (thus may be a null operation);
3 - Pass to wxWidgets to display to user.
To conserve string lengths and string processing (eg a simple alphabetical
sort in utf8). If there is to be decoding, it must take place at between 1
and 2 above.
When you say:
So, my thinking is that I'll change it for builds against wxWidgets
2.9.x and above
What does "it" mean? That you will include utf8::upgrade($string) in the
interface?
No, the code will just assume that the string passed is valid UTF-8 and
attempt to convert it to a wxString accordingly. It will never call the
libc option.
I can't see any harm in this. Just setting a character bit to 1 before an
operation and again later at worst just seems redundant.
But if we have the position where decode is called twice, this will create
problems for me. A doubly decoded value gets corrupted and becomes a
diamond with a question mark in it, or some such value.
Hope above assures you this won't happen. (We won't be double decoding.)
Cheers
Mark