On Thu, Apr 6, 2017 at 1:33 AM, Heikki Linnakangas <hlinn...@iki.fi> wrote:
> Attached is a new version. Notable changes since yesterday:
> * Implemented the rest of the SASLPrep, mapping some characters to spaces,
> leaving out others, and checking for prohibited characters and bidirectional
> strings.
> * Moved things around. There's now a separate directory, src/common/unicode,
> which contains the perl scripts and the test code. Those are not needed to
> build from source, as the pre-generated tables are put in
> src/include/common. Similar to the scripts in src/backend/utils/mb/Unicode,
> really.
> * Renamed many things from utf_* to unicode_*, since they don't deal with
> utf-8 input anymore.
> This is starting to shape up, but still some cleanup work to do. I will
> continue tomorrow..

Thanks for the new patch, that's looking nice. Now I was not able to
compile it as saslprep.h is missing from what you have sent...

There is for example this portion in the new tables:
+static const Codepoint prohibited_output_chars[] =
+   0xD800, 0xF8FF,             /* C.3, C.5 */

   ----- Start Table C.5 -----
   ----- End Table C.5 -----
This indicates a range of values. Wouldn't it be better to split this
table in two, one for the range of codepoints and another one with the
single entries?

+   0x1D173, 0x1D17A,           /* C.2.2 */
This is for musical symbols. It seems to me that checking for a range
is what is intended.

