Hi, On Wed, 2016-03-09 at 11:23 +0100, Shulgin, Oleksandr wrote: > On Tue, Mar 8, 2016 at 8:16 PM, Alvaro Herrera > <alvhe...@2ndquadrant.com> wrote: > Shulgin, Oleksandr wrote: > > > Alright. I'm attaching the latest version of this patch > split in two > > parts: the first one is NULLs-related bugfix and the second > is the > > "improvement" part, which applies on top of the first one. > > I went over patch 0001 and it seems pretty reasonable. It's > missing > some comment updates -- at least the large comments that talk > about Duj1 > should be modified to indicate why the code is now subtracting > the null > count. > > > Good point. > > > Also, I can't quite figure out why the "else" now in line 2131 > is now "else if track_cnt != 0". What happens if track_cnt is > zero? > The comment above the "if" block doesn't provide any guidance. > > > It is there only to avoid potentially dividing zero by zero when > calculating avgcount (which will not be used after that anyway). I > agree it deserves a comment.
That definitely deserves a comment. It's not immediately clear why (track_cnt != 0) would prevent division by zero in the code. The only way such error could happen is if ndistinct==0, because that's the actual denominator. Which means this ndistinct = ndistinct * sample_cnt would have to evaluate to 0. But ndistinct==0 can't happen as we're in the (nonnull_cnt > 0) branch, and that guarantees (standistinct != 0). Thus the only possibility seems to be (nonnull_cnt==toowide_cnt). Why not to use this condition instead? FWIW while looking at the code I noticed that we skip wide varlena values but not cstrings. Seems a bit suspicious. regards -- Tomas Vondra http://www.2ndQuadrant.com PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services -- Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-hackers