Hi, I'm reviewing "Multibyte charater set in levenshtein function" patch. https://commitfest.postgresql.org/action/patch_view?id=304
The main logic seems to be good, but I have some comments about the coding style and refactoring. * levenshtein_internal() and levenshtein_less_equal_internal() are very similar. Can you merge the code? We can always use less_equal_internal() if the overhead is ignorable. Did you compare them? * There are many "if (!multibyte_encoding)" in levenshtein_internal(). How about split the function into two funcs for single-byte chars and multi-byte chars? (ex. levenshtein_internal_mb() ) Or, we can always use multi-byte version if the overhead is small. * I prefer a struct rather than an array. "4 * m" and "3 * m" look like magic numbers for me. Could you name the entries with definition of a struct? /* * For multibyte encoding we'll also store array of lengths of * characters and array with character offsets in first string * in order to avoid great number of pg_mblen calls. */ prev = (int *) palloc(4 * m * sizeof(int)); * There are some compiler warnings. Avoid them if possible. fuzzystrmatch.c: In function ‘levenshtein_less_equal_internal’: fuzzystrmatch.c:428: warning: ‘char_lens’ may be used uninitialized in this function fuzzystrmatch.c:428: warning: ‘offsets’ may be used uninitialized in this function fuzzystrmatch.c:430: warning: ‘curr_right’ may be used uninitialized in this function fuzzystrmatch.c: In function ‘levenshtein_internal’: fuzzystrmatch.c:222: warning: ‘char_lens’ may be used uninitialized in this function * Coding style: Use "if (m == 0)" instead of "if (!m)" when the type of 'm' is an integer. * Need to fix the caution in docs. http://developer.postgresql.org/pgdocs/postgres/fuzzystrmatch.html | Caution: At present, fuzzystrmatch does not work well with | multi-byte encodings (such as UTF-8). but now levenshtein supports multi-byte encoding! We should mention which function supports mbchars not to confuse users. * (Not an issue for the patch, but...) Could you rewrite PG_GETARG_TEXT_P, VARDATA, and VARSIZE to PG_GETARG_TEXT_PP, VARDATA_ANY, and VARSIZE_ANY_EXHDR? Unpacking versions make the core a bit faster. -- Itagaki Takahiro -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers