On Sat, Jul 5, 2014 at 12:46 AM, Abhijit Menon-Sen <a...@2ndquadrant.com> wrote: > At 2014-07-02 15:51:08 -0700, p...@heroku.com wrote: >> >> Attached revision factors in everyone's concerns here, I think. > > Is anyone planning to review Peter's revised patch?
I have been doing some functional tests, and looked quickly at the code to understand what it does: 1) Compiles without warnings, passes regression tests 2) Checking process goes through all the existing columns of a relation even a difference of 1 with some other column(s) has already been found. As we try to limit the number of hints returned, this seems like a waste of resources. 3) distanceName could be improved, by for example having some checks on the string lengths of target and source columns, and immediately reject the match if for example the length of the source string is the double/half of the length of target. 4) This is not nice, could it be possible to remove the stuff from varlena.c? +/* Expand each Levenshtein distance variant */ +#include "levenshtein.c" +#define LEVENSHTEIN_LESS_EQUAL +#include "levenshtein.c" +#undef LEVENSHTEIN_LESS_EQUAL Part of the same comment: only varstr_leven_less_equal is used to calculate the distance, should we really move varstr_leven to core? This clearly needs to be reworked as not just a copy-paste of the things in fuzzystrmatch. The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think. 5) Do we want hints on system columns as well? For example here we could get tableoid as column hint: =# select tablepid from foo; ERROR: 42703: column "tablepid" does not exist LINE 1: select tablepid from foo; ^ LOCATION: errorMissingColumn, parse_relation.c:3123 Time: 0.425 ms 6) Sometimes no hints are returned... Even in simple cases like this one: =# create table foo (aa int, bb int); CREATE TABLE =# select ab from foo; ERROR: 42703: column "ab" does not exist LINE 1: select ab from foo; ^ LOCATION: errorMissingColumn, parse_relation.c:3123 7) Performance penalty with a table with 1600 columns: =# CREATE FUNCTION create_long_table(tabname text, columns int) RETURNS void LANGUAGE plpgsql as $$ declare first_col bool = true; count int; query text; begin query := 'CREATE TABLE ' || tabname || ' ('; for count in 0..columns loop query := query || 'col' || count || ' int'; if count <> columns then query := query || ', '; end if; end loop; query := query || ')'; execute query; end; $$; =# SELECT create_long_table('aa', 1599); create_long_table ------------------- (1 row) Then tested queries like that: SELECT col888a FROM aa; Patched version: 2.100ms~2.200ms master branch (6048896): 0.956 ms~0.990 ms So the performance impact seems limited. Regards, -- Michael -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers