Rob Browning writes:
> David Bremner writes:
>> It seems plausible to specify UTF-8 input for the library, but what
>> about the CLI? It seems like the canonicalization operation increases
>> the chance of mangling user input in non-UTF-8 locales.
>
>
Rob Browning writes:
>
> Before this change, notmuch would index two strings that differ only
> with respect to canonicalization, like tóken and tóken, as separate
> terms, even though they may be visually indistinguishable, and do (for
> most purposes) represent the same
David Bremner writes:
> One way to break this up into more bite sized pieces would be to first
> create one or more tests that fail with current notmuch, and mark those
> as broken.
Right - for the moment I just wanted to post what I had for
consideration. I didn't want to
WARNING: this version is very preliminary, and might eat your data.
Unicode has multiple sequences representing what should normally be
considered the same text. For example here's a combining AÌ and a
noncombining Ã.
Depending on the way you view this, you may or may not see a
difference,