On 06/17/2014 02:36 PM, Tom Lane wrote:
> Josh Berkus <j...@agliodbs.com> writes:
>> (2) If there are multiple columns with the same levenschtien distance,
>> which one do you suggest?  The current code picks a random one, which
>> I'm OK with.  The other option would be to list all of the columns.
> I objected to that upthread.  I don't think that picking a random one is
> sane at all.  Listing them all might be OK (I notice that that seems to be
> what both bash and git do).
> Another issue is whether to print only those having exactly the minimum
> observed Levenshtein distance, or to print everything less than some
> cutoff.  The former approach seems to me to be placing a great deal of
> faith in something that's only a heuristic.

Well, that depends on what the cutoff is.  If it's high, like 0.5, that
could be a LOT of columns.  Like, I plan to test this feature with a
3-table join that has a combined 300 columns.  I can completely imagine
coming up with a string which is within 0.5 or even 0.3 of 40 columns names.

So if we want to list everything below a cutoff, we'd need to make that
cutoff fairly narrow, like 0.2.  But that means we'd miss a lot of
potential matches on short column names.

I really think we're overthinking this: it is just a HINT, and we can
improve it in future PostgreSQL versions, and most of our users will
ignore it anyway because they'll be using a client which doesn't display

Josh Berkus
PostgreSQL Experts Inc.

Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:

Reply via email to