On Sat, Nov 15, 2014 at 7:36 PM, Peter Geoghegan <p...@heroku.com> wrote: > Attached patch incorporates this feedback. > > The only user-visible difference between this revision and the > previous revision is that it's quite possible for two suggestion to > originate from the same RTE (there is exactly one change in the > regression test's expected output as compared to the last revision for > this reason. The regression tests are otherwise unchanged). It's still > not possible to see more than 2 suggestions under any circumstances, > no matter where they might have originated from, which I think is > appropriate -- we continue to not present any HINT in the event of 3 > or more equidistant matches. > > I think that the restructuring required to pass around a state > variable has resulted in somewhat clearer code.
Cool! I'm grumpy about the distanceName() function. That seems too generic. If we're going to keep this as it is, I suggest something like computeRTEColumnDistance(). But see below. On a related note, I'm also grumpy about this comment: + /* Charge half as much per deletion as per insertion or per substitution */ + return varstr_levenshtein_less_equal(actual, len, match, match_len, + 2, 1, 2, max); The purpose of a code comment is to articulate WHY we did something, rather than simply to restate what the code quite obviously does. I haven't heard a compelling argument for why this should be 2, 1, 2 rather than the default 1, 1, 1; and I'm inclined to do the latter unless you can make some very good argument for this combination of weights. And if you can make such an argument, then there should be comments so that the next person to come along and look at this code doesn't go, huh, that's whacky, and change it. + int location, FuzzyAttrMatchState *rtestate) I suggest calling this "fuzzystate" rather than "rtestate"; it's not the state of the RTE, but the state of the fuzzy matching. Within the scanRTEForColumn block, we have a rather large chunk of code protected by if (rtestate), which contains the only call to distanceName(). I suggest that we move all of this logic to a separate, static function, and merge distanceName into it. I also suggest testing against NULL explicitly instead of implicitly. So this block of code would end up as something like: if (fuzzystate != NULL) updateFuzzyAttrMatchState(rte, attcolname, colname, &fuzzystate); In searchRangeTableForCol, I'm fairly certain that you've changed the behavior by adding a check for if (rte->rtekind == RTE_JOIN) before the call to scanRTEForColumn(). Why not instead put this check into updateFuzzyAttrMatchState? Then you can be sure you're not changing the behavior in any other case. On a similar note, I think the dropped-column test should happen early as well, probably again in updateFuzzyAttrMatchState(). There's little point in adding a suggestion only to throw it away again. + /* + * Charge extra (for inexact matches only) when an alias was + * specified that differs from what might have been used to + * correctly qualify this RTE's closest column + */ + if (wrongalias) + rtestate.distance += 3; I don't understand what situation this is catering to. Can you explain? It seems to account for a good deal of complexity. ERROR: column "oid" does not exist LINE 1: select oid > 0, * from altwithoid; ^ +HINT: Perhaps you meant to reference the column "altwithoid"."col". That seems like a stretch. I think I suggested before using a distance threshold of at most 3 or half the word length, whichever is less. For a three-letter column name that means not suggesting anything if more than one character is different. What you implemented here is close to that, yet somehow we've got a suggestion slipping through that has 2 out of 3 characters different. I'm not quite sure I see how that's getting through, but I think it shouldn't. ERROR: column fullname.text does not exist LINE 1: select fullname.text from fullname; ^ +HINT: Perhaps you meant to reference the column "fullname"."last". Same problem, only worse! They've only got one letter of four in common. ERROR: column "xmin" does not exist LINE 1: select xmin, * from fooview; ^ +HINT: Perhaps you meant to reference the column "fooview"."x". Basically the same problem again. I think the distance threshold in this case should be half the shorter column name, i.e. 0. Your new test cases include no negative test cases; that is, cases where the machinery declines to suggest a hint because of, say, 3 equally good possibilities. They probably should have something like that. -- Robert Haas EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers