Re: [HACKERS] asynchronous and vectorized execution

Andres Freund Tue, 10 May 2016 17:51:07 -0700

On 2016-05-10 12:56:17 -0400, Robert Haas wrote:
> I suspect the number of queries that are being hurt by fmgr overhead
> is really large, and I think it would be nice to attack that problem
> more directly.  It's a bit hard to discuss what's worthwhile in the
> abstract, without performance numbers, but when you vectorize, how
> much is the benefit from using SIMD instructions and how much is the
> benefit from just not going through the fmgr every time?


I think fmgr overhead is an issue, but in most profiles of execution
heavy loads I've seen the bottlenecks are elsewhere. They often seem to
roughly look like
+   15.47%  postgres  postgres           [.] slot_deform_tuple
+   12.99%  postgres  postgres           [.] slot_getattr
+   10.36%  postgres  postgres           [.] ExecMakeFunctionResultNoSets
+    9.76%  postgres  postgres           [.] heap_getnext
+    6.34%  postgres  postgres           [.] HeapTupleSatisfiesMVCC
+    5.09%  postgres  postgres           [.] heapgetpage
+    4.59%  postgres  postgres           [.] hash_search_with_hash_value
+    4.36%  postgres  postgres           [.] ExecQual
+    3.30%  postgres  postgres           [.] ExecStoreTuple
+    3.29%  postgres  postgres           [.] ExecScan

or

-   33.67%  postgres  postgres           [.] ExecMakeFunctionResultNoSets
   - ExecMakeFunctionResultNoSets
      + 99.11% ExecEvalOr
      + 0.89% ExecQual
+   14.32%  postgres  postgres           [.] slot_getattr
+    5.66%  postgres  postgres           [.] ExecEvalOr
+    5.06%  postgres  postgres           [.] check_stack_depth
+    5.02%  postgres  postgres           [.] slot_deform_tuple
+    4.05%  postgres  postgres           [.] pgstat_end_function_usage
+    3.69%  postgres  postgres           [.] heap_getnext
+    3.41%  postgres  postgres           [.] ExecEvalScalarVarFast
+    3.36%  postgres  postgres           [.] ExecEvalConst


with a healthy dose of _bt_compare, heap_hot_search_buffer in more index
heavy workloads.

(yes, I just pulled these example profiles from somewhere, but I've more
often seen them look like this, than very fmgr heavy).


That seems to suggest that we need to restructure how we get to calling
fmgr functions, before worrying about the actual fmgr call.


Tomas, Mark, IIRC you'd both generated perf profiles for TPC-H (IIRC?)
queries at some point. Any chance the results are online somewhere?

Greetings,

Andres Freund


-- 
Sent via pgsql-hackers mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Re: [HACKERS] asynchronous and vectorized execution

Reply via email to