Hi Sven & Max,

On Fri, Jul 13, 2018 at 07:59:32PM +0200, Sven Van Caekenberghe wrote:
> Alistair, are you aware of the following (article/codebase) ?
> 
>   
> https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43
> 
> Due to the size of the full DB it is doubtful it would become a standard part 
> of Pharo though.
> 
> Sven


Thanks again for the link, it has helped my (still limited) 
understanding of Unicode.

The reason I started looking in to this was that my file attribute 
modification tests failed on Mac OS.  The problem is that Mac OS 
requires file names to be decomposed UTF8, and my plugin wasn't doing 
the conversion.

Following the general principle of trying to keep the VM minimal and do 
as much as possible in the image, I had hoped I could do the UTF8 
(de)composition in the image.

But it turns out that Mac OS doesn't follow the standard rules, so 
programs really need to use the native file name encoding routines on 
Mac OS.

So that's the path I'll be following in this instance.  I still really
appreciate the link, and will be exploring the Unicode package more.



On Fri, Jul 13, 2018 at 10:50:36PM +0200, Max Leske wrote:
> Hi Alistair,
> 
> *nix systems usually come with the iconv[1] command line program that
> implements the normalization and denormalization algorithms, or Uconv 2, a
> library that does the same thing. These algorithms include a lot of black 
> magic
> and I recommend to not make your hands dirty with them. With the FFI interface
> Pharo has today it shouldn't be too hard to call out to Uconv (although I'm 
> not
> saying it's trivial; I've written a VM plugin that we use a work to interface
> with Uconv and you do have to know how encodings and iconv work) or execute
> iconv directly.
> 
> I can send you a copy of the plugin code if you want, actually, I may put it 
> on
> github if there's interest.
> 
> Cheers,
> Max
> 
> [1] https://linux.die.net/man/1/iconv
> [2] https://en.wikipedia.org/wiki/Uconv
> [3] http://site.icu-project.org/
> 
> 

On Sat, Jul 14, 2018 at 08:20:23AM +0200, Max Leske wrote:
> I realize I got things mixed up a bit: Uconv is a program akin to Iconv. What
> we interface with is libicu.
> 
> Max

The VM already uses libiconv for the encoding translation on linux.  As 
far as I know, the routines haven't been exposed directly to the image 
(although I haven't looked carefully).

I'd be interested in looking at your plugin - I'm still working through 
the current FilePlugin behaviour, but I think it would be useful to have 
these routines available directly from the image for debugging, etc.

Thanks again,
Alistair


Reply via email to