Alistair, are you aware of the following (article/codebase) ?

  
https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43

Due to the size of the full DB it is doubtful it would become a standard part 
of Pharo though.

Sven

> On 13 Jul 2018, at 19:46, Alistair Grant <[email protected]> wrote:
> 
> Hi Sven & Everyone,
> 
> I need to convert an UTF8 encoded decomposed stream (Mac OS file
> names) in to a composed string, e.g.:
> 
> string: 'test-äöü-äöü'
> code points: #(116 101 115 116 45 228 246 252 45 97 776 111 776 117 776)
> utf8 encoding: #[116 101 115 116 45 195 164 195 182 195 188 45 97 204
> 136 111 204 136 117 204 136]
> 
> In the above string, the first group of 3 accented characters are the
> same as the second group of 3, but are encoded differently - code
> points (228 246 252) vs (97 776 111 776 117 776).
> 
> Reading the utf8 encoded stream should result in:
> 
> string: 'test-äöü-äöü'
> code points: #(116 101 115 116 45 228 246 252 45 228 246 252)
> utf8 encoding: #[116 101 115 116 45 195 164 195 182 195 188 45 195 164
> 195 182 195 188]
> 
> My current thought is to write a ZnUnicodeComposingReadStream which
> would wrap a ZnCharacterReadStream and return the composed characters.
> 
> What do you think?
> 
> Thanks!
> Alistair
> 


Reply via email to