On Thu, Nov 20, 2014 at 7:32 AM, Robert Haas <robertmh...@gmail.com> wrote: >> In general, I think the cost of a bad suggestion is much lower than >> the benefit of a good one. You seem to be suggesting that they're >> equal. Or that they're equally likely in an organic situation. In my >> estimation, this is not the case at all. > > The way I see it, the main cost of a bad suggestion is that it annoys > the user with clutter which they may brand as "stupid". Think about > how much vitriol has been spewed over the years against progress bars > (or estimated completion) times that don't turn out to mirror reality.
Well, you can judge the quality of the suggestion immediately. I imagined a mechanism that gives a little bit more than the minimum amount of guidance for things like contractions/abbreviations. > Microsoft has gotten more cumulative flack about their inaccurate > progress bars over the years than they would have for dropping an > elevator on a cute baby. I haven't used a more recent version of Windows than Windows Vista, but I'm pretty sure that they kept it up. >> I'm curious about your thoughts on the compromise of a ramped up >> distance threshold to apply a test for the absolute quality of a >> match. I think that the fact that git gives bad suggestions with terse >> strings tells us a lot, though. Note that unlike git, with terse >> strings we may well have a good deal more equidistant matches, and as >> soon as the number of would-be matches exceeds 2, we actually give no >> matches at all. So that's an additional protection against poor >> matches with terse strings. > > I don't know what you mean by a ramped-up distance threshold, exactly. > I think it's good for the distance threshold to be lower for small > strings and higher for large ones. I think I'm somewhat open to > negotiation on the details, but I think any system that's going to > suggest "quantity" for "tit" is going too far. I mean the suggestion of raising the cost threshold more gradually, not as a step function of the number of characters in the string [1] where it's either over 6 characters and must pass the 50% test, or isn't and has no absolute quality test. The exact modification I described will FWIW remove the "quantity" for "qty" suggestion, as well as all the similar suggestions that you found objectionable (like "tit" also offering a suggestion of "quantity"). If you look at the regression tests, none of the sensible suggestions are lost (some would be by an across the board 50% absolute quality threshold, as I previously pointed out [2]), but all the bad ones are. I attach failed regression test output showing the difference between the previous expected values, and actual values with that small modification - it looks like most or all bad cases are now fixed. > If the user types > "qty" when they meant "quantity", they probably don't really need the > hint, because they're going to say to themselves "wait, I guess I > didn't abbreviate that". The time when they need the hint is when > they typed "quanttiy", because it's quite possible to read a query > with that sort of typo multiple times and not realize that you've made > one. I agree that that's a more important case. > In other words, I think there's value in trying to clue somebody in > when they've made a typo, but not when they've made a think-o. We > won't be able to do the latter accurately enough to make it more > useful than annoying. That's certainly true; I think that we only disagree about the exact point at which we enter the think-o correction business. [1] http://www.postgresql.org/message-id/CAM3SWZT+7hH29Go6ZuY2OrCS40=6ypvm_nt9njfovp3xwji...@mail.gmail.com [2] http://www.postgresql.org/message-id/cam3swztsgoknht8rk+0eed7spnjg4padmbqqyi0fh9bwcnv...@mail.gmail.com -- Peter Geoghegan
regression.diffs
Description: Binary data
-- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers