On 16 Jul 2018, at 19:46, Alistair Grant wrote:

Hi Sven & Max,


On Fri, Jul 13, 2018 at 07:59:32PM +0200, Sven Van Caekenberghe wrote:
Alistair, are you aware of the following (article/codebase) ?

  
https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43

Due to the size of the full DB it is doubtful it would become a standard part of Pharo though.

Sven


Thanks again for the link, it has helped my (still limited)
understanding of Unicode.

The reason I started looking in to this was that my file attribute
modification tests failed on Mac OS.  The problem is that Mac OS
requires file names to be decomposed UTF8, and my plugin wasn't doing
the conversion.

Following the general principle of trying to keep the VM minimal and do
as much as possible in the image, I had hoped I could do the UTF8
(de)composition in the image.

But it turns out that Mac OS doesn't follow the standard rules, so
programs really need to use the native file name encoding routines on
Mac OS.

So that's the path I'll be following in this instance.  I still really
appreciate the link, and will be exploring the Unicode package more.



On Fri, Jul 13, 2018 at 10:50:36PM +0200, Max Leske wrote:
Hi Alistair,

*nix systems usually come with the iconv[1] command line program that
implements the normalization and denormalization algorithms, or Uconv 2, a library that does the same thing. These algorithms include a lot of black magic and I recommend to not make your hands dirty with them. With the FFI interface Pharo has today it shouldn't be too hard to call out to Uconv (although I'm not saying it's trivial; I've written a VM plugin that we use a work to interface with Uconv and you do have to know how encodings and iconv work) or execute
iconv directly.

I can send you a copy of the plugin code if you want, actually, I may put it on
github if there's interest.

Cheers,
Max

[1] https://linux.die.net/man/1/iconv
[2] https://en.wikipedia.org/wiki/Uconv
[3] http://site.icu-project.org/



On Sat, Jul 14, 2018 at 08:20:23AM +0200, Max Leske wrote:
I realize I got things mixed up a bit: Uconv is a program akin to Iconv. What
we interface with is libicu.

Max

The VM already uses libiconv for the encoding translation on linux. As
far as I know, the routines haven't been exposed directly to the image
(although I haven't looked carefully).

I'd be interested in looking at your plugin - I'm still working through the current FilePlugin behaviour, but I think it would be useful to have
these routines available directly from the image for debugging, etc.

Thanks again,
Alistair

I've put the plugin source on Github: https://github.com/Netstyle/Squeak-VM-Unicode-operations-plugin.

I hope you'll find it useful. Note that the code was written for version 4.0.3-2202 of the Squeak VM and that you'd most likely have to make a couple of modifications to get it running in the OpenSmalltalk VM's.


Cheers,
Max

Reply via email to