On Tue, Jul 20, 2010 at 3:37 AM, Itagaki Takahiro <itagaki.takah...@gmail.com> wrote: > 2010/7/13 Alexander Korotkov <aekorot...@gmail.com>: >> Anyway I think that overhead is not ignorable. That's why I have splited >> levenshtein_internal into levenshtein_internal and levenshtein_internal_mb, >> and levenshtein_less_equal_internal into levenshtein_less_equal_internal and >> levenshtein_less_equal_internal_mb. > > Thank you for good measurement! Then, it's reasonable to have multiple > implementations. It also has documentation. I'll change status of the > patch to "Ready for Committer". > > The patch is good enough except argument types for some functions. > For example: > - char* vs. const char* > - text* vs. const char* + length > I hope committers would check whether there are better types for them.
This patch still needs some work. It includes a bunch of stylistic changes that aren't relevant to the purpose of the patch. There's no reason that I can see to change the existing levenshtein_internal function to take text arguments instead of char *, or to change !m to m == 0 in existing code, or to change the whitespace in the comments of that function. All of those need to be reverted before we can consider committing this. There is a huge amount of duplicated code here. I think it makes sense to have the multibyte version of the function be separate, but perhaps we can merge the less-than-or-equal to bits into the main code, so that we only have two copies instead of four. Perhaps we can't just add a max_d argument max_distance to levenshtein_internal; and if this value is >=0 then it represents the max allowable distance, but if it is <0 then there is no limit. Sure, that might slow down the existing code a bit, but it might not be significant. I'd at least like to see some numbers showing that it is significant before we go to this much trouble. The code doesn't follow the project coding style. Braces must be uncuddled. Comment paragraphs will be reflowed unless they begin and end with ------. Function definitions should have the type declaration on one line and the function name at the start of the next. Freeing memory with pfree is likely a waste of time; is there any reason not to just rely on the memory context reset, as the original coding did? I think we might need to remove the acknowledgments section from this code. If everyone who touches this code adds their name, we're quickly going to have a mess. If we're not going to remove the acknowledgments section, then please add my name, too, because I've already patched this code once... I'm going to set this back to "Waiting on Author". -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise Postgres Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers