On Sat, Jul 5, 2014 at 12:46 AM, Abhijit Menon-Sen <a...@2ndquadrant.com> wrote:
> At 2014-07-02 15:51:08 -0700, p...@heroku.com wrote:
>> Attached revision factors in everyone's concerns here, I think.
> Is anyone planning to review Peter's revised patch?

I have been doing some functional tests, and looked quickly at the
code to understand what it does:
1) Compiles without warnings, passes regression tests
2) Checking process goes through all the existing columns of a
relation even a difference of 1 with some other column(s) has already
been found. As we try to limit the number of hints returned, this
seems like a waste of resources.
3) distanceName could be improved, by for example having some checks
on the string lengths of target and source columns, and immediately
reject the match if for example the length of the source string is the
double/half of the length of target.
4) This is not nice, could it be possible to remove the stuff from varlena.c?
+/* Expand each Levenshtein distance variant */
+#include "levenshtein.c"
+#include "levenshtein.c"
Part of the same comment: only varstr_leven_less_equal is used to
calculate the distance, should we really move varstr_leven to core?
This clearly needs to be reworked as not just a copy-paste of the
things in fuzzystrmatch.
The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think.
5) Do we want hints on system columns as well? For example here we
could get tableoid as column hint:
=# select tablepid from foo;
ERROR:  42703: column "tablepid" does not exist
LINE 1: select tablepid from foo;
LOCATION:  errorMissingColumn, parse_relation.c:3123
Time: 0.425 ms
6) Sometimes no hints are returned... Even in simple cases like this one:
=# create table foo (aa int, bb int);
=# select ab from foo;
ERROR:  42703: column "ab" does not exist
LINE 1: select ab from foo;
LOCATION:  errorMissingColumn, parse_relation.c:3123
7) Performance penalty with a table with 1600 columns:
=# CREATE FUNCTION create_long_table(tabname text, columns int)
LANGUAGE plpgsql
as $$
  first_col bool = true;
  count int;
  query text;
  query := 'CREATE TABLE ' || tabname || ' (';
  for count in 0..columns loop
    query := query || 'col' || count ||  ' int';
    if count <> columns then
      query := query || ', ';
    end if;
  end loop;
  query := query || ')';
  execute query;
=# SELECT create_long_table('aa', 1599);

(1 row)
Then tested queries like that: SELECT col888a FROM aa;
Patched version: 2.100ms~2.200ms
master branch (6048896): 0.956 ms~0.990 ms
So the performance impact seems limited.


Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to