Hi Sven, Thanks very much for your quick reply...
On Fri, 13 Jul 2018 at 19:59, Sven Van Caekenberghe <[email protected]> wrote: > > Alistair, are you aware of the following (article/codebase) ? > > > https://medium.com/concerning-pharo/an-implementation-of-unicode-normalization-7c6719068f43 > > Due to the size of the full DB it is doubtful it would become a standard part > of Pharo though. > > Sven I hadn't seen this. I'll read it next (although I think it will take me longer than 17 minutes :-)). But a quick, partial answer is that I was planning on only supporting the composition and decomposition tables that are already included in the main image as part of CombinedChar (see the Composition and Decomposition class variables). Thanks again, Alistair > > On 13 Jul 2018, at 19:46, Alistair Grant <[email protected]> wrote: > > > > Hi Sven & Everyone, > > > > I need to convert an UTF8 encoded decomposed stream (Mac OS file > > names) in to a composed string, e.g.: > > > > string: 'test-äöü-äöü' > > code points: #(116 101 115 116 45 228 246 252 45 97 776 111 776 117 776) > > utf8 encoding: #[116 101 115 116 45 195 164 195 182 195 188 45 97 204 > > 136 111 204 136 117 204 136] > > > > In the above string, the first group of 3 accented characters are the > > same as the second group of 3, but are encoded differently - code > > points (228 246 252) vs (97 776 111 776 117 776). > > > > Reading the utf8 encoded stream should result in: > > > > string: 'test-äöü-äöü' > > code points: #(116 101 115 116 45 228 246 252 45 228 246 252) > > utf8 encoding: #[116 101 115 116 45 195 164 195 182 195 188 45 195 164 > > 195 182 195 188] > > > > My current thought is to write a ZnUnicodeComposingReadStream which > > would wrap a ZnCharacterReadStream and return the composed characters. > > > > What do you think? > > > > Thanks! > > Alistair > > > >
