On 1/28/20 9:21 PM, Peter Eisentraut wrote:
You're right, this didn't make any sense.  Here is a new patch set with that fixed.

Thanks for this patch. This is a feature which has been on my personal todo list for a while and something which I have wished to have a couple of times.

I took a quick look at the patch and here is some feedback:

A possible concern is increased binary size from the new tables for the quickcheck but personally I think they are worth it.

A potential optimization would be to merge utf8_to_unicode() and pg_utf_mblen() into one function in unicode_normalize_func() since utf8_to_unicode() already knows length of the character. Probably not worth it though.

It feels a bit wasteful to measure output_size in unicode_is_normalized() since unicode_normalize() actually already knows the length of the buffer, it just does not return it.

A potential optimization for the normalized case would be to abort the quick check on the first maybe and normalize from that point on only. If I can find the time I might try this out and benchmark it.

Nitpick: "split/\s*;\s*/, $line" in generate-unicode_normprops_table.pl should be "split /\s*;\s*/, $line".

What about using else if in the code below for clarity?

+               if (check == UNICODE_NORM_QC_NO)
+                       return UNICODE_NORM_QC_NO;
+               if (check == UNICODE_NORM_QC_MAYBE)
+                       result = UNICODE_NORM_QC_MAYBE;

Remove extra space in the line below.

+       else if (quickcheck == UNICODE_NORM_QC_NO )

Andreas


Reply via email to