Hi, On Tue, Jul 8, 2014 at 6:58 AM, Michael Paquier <michael.paqu...@gmail.com> wrote: > 2) Checking process goes through all the existing columns of a > relation even a difference of 1 with some other column(s) has already > been found. As we try to limit the number of hints returned, this > seems like a waste of resources.
In general it's possible that an exact match will later be found within the RTE, and exact matches don't have to pay the "wrong alias" penalty, and are immediately returned. It is therefore not a waste of resources, but even if it was that would be pretty inconsequential as your benchmark shows. > 3) distanceName could be improved, by for example having some checks > on the string lengths of target and source columns, and immediately > reject the match if for example the length of the source string is the > double/half of the length of target. I don't think it's a good idea to tie distanceName() to the ultimate behavior of errorMissingColumn() hinting, since there may be other callers in the future. Besides, that isn't going to help much. > 4) This is not nice, could it be possible to remove the stuff from varlena.c? > +/* Expand each Levenshtein distance variant */ > +#include "levenshtein.c" > +#define LEVENSHTEIN_LESS_EQUAL > +#include "levenshtein.c" > +#undef LEVENSHTEIN_LESS_EQUAL > Part of the same comment: only varstr_leven_less_equal is used to > calculate the distance, should we really move varstr_leven to core? > This clearly needs to be reworked as not just a copy-paste of the > things in fuzzystrmatch. > The flag LEVENSHTEIN_LESS_EQUAL should be let within fuzzystrmatch I think. So there'd be one variant within core and one within contrib/fuzzystrmatch? I don't think that's an improvement. > 5) Do we want hints on system columns as well? I think it's obvious that the answer must be no. That's going to frequently result in suggestions of columns that users will complain aren't even there. If you know about the system columns, you can just get it right. They're supposed to be hidden for most purposes. > 6) Sometimes no hints are returned... Even in simple cases like this one: > =# create table foo (aa int, bb int); > CREATE TABLE > =# select ab from foo; > ERROR: 42703: column "ab" does not exist > LINE 1: select ab from foo; > ^ > LOCATION: errorMissingColumn, parse_relation.c:3123 That's because those two candidates come from a single RTE and have an equal distance -- you'd see both suggestions if you joined two tables with each candidate, assuming that each table being joined didn't individually have the same issue. I think that that's probably considered the correct behavior by most. -- Peter Geoghegan -- Sent via pgsql-hackers mailing list (firstname.lastname@example.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers