On 16 Jul 2018, at 19:46, Alistair Grant wrote:
Hi Sven & Max,
On Fri, Jul 13, 2018 at 07:59:32PM +0200, Sven Van Caekenberghe wrote:
Alistair, are you aware of the following (article/codebase) ?
https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43
Due to the size of the full DB it is doubtful it would become a
standard part of Pharo though.
Sven
Thanks again for the link, it has helped my (still limited)
understanding of Unicode.
The reason I started looking in to this was that my file attribute
modification tests failed on Mac OS. The problem is that Mac OS
requires file names to be decomposed UTF8, and my plugin wasn't doing
the conversion.
Following the general principle of trying to keep the VM minimal and
do
as much as possible in the image, I had hoped I could do the UTF8
(de)composition in the image.
But it turns out that Mac OS doesn't follow the standard rules, so
programs really need to use the native file name encoding routines on
Mac OS.
So that's the path I'll be following in this instance. I still really
appreciate the link, and will be exploring the Unicode package more.
On Fri, Jul 13, 2018 at 10:50:36PM +0200, Max Leske wrote:
Hi Alistair,
*nix systems usually come with the iconv[1] command line program that
implements the normalization and denormalization algorithms, or Uconv
2, a
library that does the same thing. These algorithms include a lot of
black magic
and I recommend to not make your hands dirty with them. With the FFI
interface
Pharo has today it shouldn't be too hard to call out to Uconv
(although I'm not
saying it's trivial; I've written a VM plugin that we use a work to
interface
with Uconv and you do have to know how encodings and iconv work) or
execute
iconv directly.
I can send you a copy of the plugin code if you want, actually, I may
put it on
github if there's interest.
Cheers,
Max
[1] https://linux.die.net/man/1/iconv
[2] https://en.wikipedia.org/wiki/Uconv
[3] http://site.icu-project.org/
On Sat, Jul 14, 2018 at 08:20:23AM +0200, Max Leske wrote:
I realize I got things mixed up a bit: Uconv is a program akin to
Iconv. What
we interface with is libicu.
Max
The VM already uses libiconv for the encoding translation on linux.
As
far as I know, the routines haven't been exposed directly to the image
(although I haven't looked carefully).
I'd be interested in looking at your plugin - I'm still working
through
the current FilePlugin behaviour, but I think it would be useful to
have
these routines available directly from the image for debugging, etc.
Thanks again,
Alistair
I've put the plugin source on Github:
https://github.com/Netstyle/Squeak-VM-Unicode-operations-plugin.
I hope you'll find it useful. Note that the code was written for version
4.0.3-2202 of the Squeak VM and that you'd most likely have to make a
couple of modifications to get it running in the OpenSmalltalk VM's.
Cheers,
Max