Josh Berkus <j...@agliodbs.com> writes:
> On 06/17/2014 02:36 PM, Tom Lane wrote:
>> Another issue is whether to print only those having exactly the minimum
>> observed Levenshtein distance, or to print everything less than some
>> cutoff. The former approach seems to me to be placing a great deal of
>> faith in something that's only a heuristic.
> Well, that depends on what the cutoff is. If it's high, like 0.5, that
> could be a LOT of columns. Like, I plan to test this feature with a
> 3-table join that has a combined 300 columns. I can completely imagine
> coming up with a string which is within 0.5 or even 0.3 of 40 columns names.
I think Levenshtein distances are integers, though that's just a minor
> So if we want to list everything below a cutoff, we'd need to make that
> cutoff fairly narrow, like 0.2. But that means we'd miss a lot of
> potential matches on short column names.
I'm not proposing an immutable cutoff. Something that scales with the
string length might be a good idea, or we could make it a multiple of
the minimum observed distance, or probably there are a dozen other things
we could do. I'm just saying that if we have an alternative at distance
3, and another one at distance 4, it's not clear to me that we should
assume that the first one is certainly what the user had in mind.
Especially not if all the other alternatives are distance 10 or more.
> I really think we're overthinking this: it is just a HINT, and we can
> improve it in future PostgreSQL versions, and most of our users will
> ignore it anyway because they'll be using a client which doesn't display
Agreed that we can make it better later. But whether it prints exactly
one suggestion, and whether it does that no matter how silly the
suggestion is, are rather fundamental decisions.
regards, tom lane
Sent via pgsql-hackers mailing list (email@example.com)
To make changes to your subscription: