--- Begin Message ---
On Wed, 16 Jan 2019 at 18:37, Sven Van Caekenberghe <[email protected]> wrote:
> Still, one of the conclusions of previous discussions about the encoding
> of environment variables was/is that there is no single correct solution.
> OS's are not consistent in how the encoding is done in all (historical)
> contexts (like sometimes,
> 1 env var defines the encoding to use for others,
ouch. That one point nearly made my retract my comment next paragraph, but
is there much more complexity?
or just a case of utf8<==>appSpecificEncoding rather than
ascii<==>appSpecificEncoding ?
Sorry if I'm rehashing past discussion (do you have a link?), but
considering...
* 92% of web pages are UTF8 encoded[1] such that pragmatically UTF8 *is*
the standard for text
* Strings so pervasive in a system
...would there be an overall benefit to adopt UTF8 as the encoding for
Strings
consistently provided across the cross-platform vm interface?
(i.e. fixing platforms that don't comply to the standard due to their
historical baggage)
And I found it interesting Microsoft are making some moves towards UTF8
[2]...
"With insider build 17035 and the April 2018 update (nominal build 17134)
for Windows 10, a "Beta: Use Unicode UTF-8 for worldwide language support"
checkbox appeared for setting the locale code page to UTF-8.[a] This allows
for calling "narrow" functions, including fopen and SetWindowTextA, with
UTF-8 strings. "
The approach vm-side could be similar to Section 10 How to do text on
Windows [3]
with the philosophy of "performing the [conversions] as close to API calls
as possible,
and never holding the [converted] data."
[1]
https://w3techs.com/technologies/history_overview/character_encoding/ms/y
[2] https://en.wikipedia.org/wiki/Unicode_in_Microsoft_Windows
[3] http://utf8everywhere.org/
different applications do different things, and other such nice stuff), and
> certainly not across platforms.
>
> So this is really complex.
>
> Do we want to hide this in some obscure VM C code that very few people can
> see, read, let alone help with ?
>
> The image side is perfectly capable of dealing with platform differences
> in a clean/clear way, and at least we can then use the full power of our
> language and our tools.
>
Big question... Do we currently have primitives of the same name returning
different encodings on different platforms? I presume that would be
awkward.
If the image is handle encoding differences, should separate primitives be
used? e.g. utf8GetEnv & utf16getEnv
Could I get some feedback on [4] saying... **The Single Most Important Fact
About Encodings**
If you completely forget everything I just explained, please remember one
extremely important fact.
It does not make sense to have a string without knowing what encoding it
uses. "
And so... does our String nowadays require an 'encoding' instance variable
such that this is *always* associated?
This might remove any need for separate utf8GetEnv & utf16getEnv (if that
was even a reasonable idea).
cheers -ben
[4]
https://www.joelonsoftware.com/2003/10/08/the-absolute-minimum-every-software-developer-absolutely-positively-must-know-about-unicode-and-character-sets-no-excuses/
> > On 16 Jan 2019, at 10:59, Guillermo Polito <[email protected]>
> wrote:
> >
> > Hi Nicolas,
> >
> > On Wed, Jan 16, 2019 at 10:25 AM Nicolas Cellier <
> [email protected]> wrote:
> > IMO, windows VM (and plugins) should do the UCS2 -> UTF8 conversion
> because the purpose of a VM is to provide an OS independant façade.
> > I made progress recently in this area, but we should finish the
> job/test/consolidate.
> >
> > I'm following your changes for windows from the shadows and I think they
> are awesome :).
> >
> > If someone bypass the VM and use direct windows API thru FFI, then he
> takes the responsibility, but uniformity doesn't hurt.
> >
> > So far we are using FFI for this, as you say we create first
> Win32WideStrings from utf8 strings and then we use ffi calls to the *W
> functions.
> > I don't think we can make it for Pharo7.0.0. The cycle to build, do some
> acceptance tests, and then bless a new VM as stable is far too long for our
> inminent release :).
> >
> > But this could be for a 7.1.0, and if you like I can surely give a hand
> on this.
> >
> > Guille
>
>
>
--- End Message ---