On Fri, Mar 30, 2018 at 5:00 AM, Andres Freund <and...@anarazel.de> wrote:
> Hi,
>
> On 2018-03-29 10:16:23 +1030, Andrew Dunstan wrote:
>> On Wed, Mar 28, 2018 at 5:30 PM, Andres Freund <and...@anarazel.de> wrote:
>> > Hi,
>> >
>> > On 2018-03-26 09:02:10 +1030, Andrew Dunstan wrote:
>> >> Thanks for this, all looks good. Here is the consolidate patch
>> >> rebased. If there are no further comments I propose to commit this in
>> >> a few days time.
>> >
>> > Here's a bit of post commit review:
>> >
>> > @@ -1310,13 +1679,11 @@ slot_getsomeattrs(TupleTableSlot *slot, int attnum)
>> >
>> >         /*
>> >          * If tuple doesn't have all the atts indicated by tupleDesc, read 
>> > the
>> > -        * rest as null
>> > +        * rest as NULLs or missing values
>> >          */
>> > -       for (; attno < attnum; attno++)
>> > -       {
>> > -               slot->tts_values[attno] = (Datum) 0;
>> > -               slot->tts_isnull[attno] = true;
>> > -       }
>> > +       if (attno < attnum)
>> > +               slot_getmissingattrs(slot, attno, attnum);
>> > +
>> >         slot->tts_nvalid = attnum;
>> >  }
>> >
>> > It's worthwhile to note that this'll re-process *all* missing values,
>> > even if they've already been deformed. I.e. if
>> > slot_getmissingattrs(.., first-missing)
>> > slot_getmissingattrs(.., first-missing + 1)
>> > slot_getmissingattrs(.., first-missing + 2)
>> > is called, all three missing values will be copied every time. That's
>> > because tts_nvalid isn't taken into account.  I wonder if 
>> > slot_getmissingattrs
>> > could take tts_nvalid into account?
>> >
>> > I also wonder if this doesn't deserve an unlikely(), to avoid the cost
>> > of spilling registers in the hot branch of not missing any values.
>
>> One of us at least is very confused about this function.
>> slot_getmissingattrs() gets called at most once by slot_getsomeattrs
>> and never for any attribute that's already been deformed.
>
> Imagine the same slot being used in two different expressions. The
> typical case for that is e.g. something like
>   SELECT a,b,c,d FROM foo WHERE b = 1;
>
> this'll trigger one slot_getsomeattrs(slot, 2) call from within qual
> evaluation, and then a slot_getsomeattrs(slot, 4) from within the
> projection code.  If you then imagine a tuple where only the first
> column exists physically, we'd copy b twice, because attno is only going
> to be one 1.  I think we might just want to check tts nvalid in
> getmissingattrs?
>
>


OK. so add something like this to the top of slot_getmissingattrs()?

    startAttNum = Max(startAttNum, slot->tts_nvalid);



>> > I'm still strongly object to do doing this unconditionally for queries
>> > that never need any of this.  We're can end up with a significant number
>> > of large tuples in memory here, and then copy that through dozens of
>> > tupledesc copies in queries.
>
>> We're just doing the same thing we do for default values.
>
> That's really not a defense. It's hurting us there too.
>


I will take a look and see if it can be avoided easily.


>
>> None of the tests I did with large numbers of missing values seemed to
>> show performance impacts of the kind you describe. Now, none of the
>> queries were particularly complex, but the worst case was from
>> actually using very large numbers of attributes with missing values,
>> where we'd need this data anyway. If we just selected a few attributes
>> performance went up rather than down.
>
> Now imagine a partitioned workload with a few thousand partitions over a
> few tables. The additional memory is going to start being noticeable.
>
>
>> pg_attrdef isn't really a good place for it (what if they drop the
>> default?). So the only alternative then would be a completely new
>> catalog. I'd need a deal of convincing that that was justified.
>
> There's plenty databases with pg_attribute being many gigabytes large,
> and this is going to make that even worse.
>


I'll change it if there's a consensus about it, but so far the only
other opinion has been from Tom who doesn't apparently see much of a
problem.

cheers

andrew

-- 
Andrew Dunstan                https://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

Reply via email to