"Radovan Antloga" <[EMAIL PROTECTED]> writes: >> 190 fields in a table seems like rather a lot ... is that actually >> representative of your intended applications?
> Test table is like table I use in production > with Firebird and Oracle db. Table has a lot of smallint > and integer fields. I did some experiments with CVS tip on updating all rows of a table with lots of columns --- to be specific, create table widetable( int1 int, text1 text, num1 numeric, int2 int, text2 text, num2 numeric, int3 int, text3 text, num3 numeric, ... int59 int, text59 text, num59 numeric, int60 int, text60 text, num60 numeric ); for 180 columns altogether, with 16k rows of data and the test query update widetable set int30 = null; The gprof profile looks like this: % cumulative self self total time seconds seconds calls ms/call ms/call name 19.77 1.22 1.22 _mcount 14.91 2.14 0.92 16385 0.06 0.06 XLogInsert 9.08 2.70 0.56 2932736 0.00 0.00 slot_deform_tuple 7.94 3.19 0.49 2965504 0.00 0.00 slot_getattr 6.48 3.59 0.40 2949120 0.00 0.00 ExecEvalVar 5.83 3.95 0.36 16384 0.02 0.02 ExecTargetList 4.70 4.24 0.29 16384 0.02 0.02 heap_fill_tuple 3.57 4.46 0.22 ExecEvalVar 2.43 4.61 0.15 _write_sys 2.27 4.75 0.14 16384 0.01 0.01 heap_compute_data_size 1.62 4.85 0.10 noshlibs 1.46 4.94 0.09 16384 0.01 0.03 heap_form_tuple 1.30 5.02 0.08 16384 0.00 0.01 ExecGetJunkAttribute 1.30 5.10 0.08 encore 1.13 5.17 0.07 16384 0.00 0.00 ExecFilterJunk 1.13 5.24 0.07 chunk2 The large number of calls to slot_deform_tuple() is annoying --- ideally there'd be only one per row. But what actually happens is that the ExecVariableList() optimization is disabled by the presence of one non-user attribute in the scan's targetlist (ie, ctid, which is needed by the top-level executor to do the UPDATE), not to mention that the attribute(s) being updated will have non-Var expressions anyway. So we execute the target list the naive way, and because the Vars referencing the not-updated columns appear sequentially in the tlist, that means each ExecEvalVar/slot_getattr ends up calling slot_deform_tuple again to decipher just one more column of the tuple. This is just an O(N) penalty, not O(N^2), but still it's pretty annoying considering that all the infrastructure is there to do better. If we were to determine the max attribute number to be fetched and call slot_getsomeattrs() up front (as happens in the ExecVariableList case) then we could save a significant constant factor --- perhaps as much as 10% of the runtime in this case. The trick with such "optimizations" is to not turn them into pessimizations --- if we decode attributes that end up not getting fetched then we aren't saving cycles. So I'm thinking of tying this specifically to the scan node's targetlist and only doing the slot_getsomeattrs() call when we have decided to evaluate the targetlist. Any columns referenced as part of the node's qual conditions wouldn't participate in the improvement. We could alternatively do the slot_getsomeattrs() call before evaluating the quals, but I'm worried that this would be a loss in the case where the qual condition fails and so the targetlist is never evaluated. Comments, better ideas? regards, tom lane ---------------------------(end of broadcast)--------------------------- TIP 2: Don't 'kill -9' the postmaster