On 2012-12-20 15:51:37 +0100, Andres Freund wrote:
> On 2012-12-20 15:45:47 +0100, Andres Freund wrote:
> > On 2012-12-20 09:11:46 -0500, Robert Haas wrote:
> > > On Thu, Dec 20, 2012 at 8:55 AM, Simon Riggs <si...@2ndquadrant.com> 
> > > wrote:
> > > > On 18 December 2012 22:10, Robert Haas <robertmh...@gmail.com> wrote:
> > > >> Well that would be nice, but the problem is that I see no way to
> > > >> implement it.  If, with a unified parser, the parser is 14% of our
> > > >> source code, then splitting it in two will probably crank that number
> > > >> up well over 20%, because there will be duplication between the two.
> > > >> That seems double-plus un-good.
> > > >
> > > > I don't think the size of the parser binary is that relevant. What is
> > > > relevant is how much of that is regularly accessed.
> > > >
> > > > Increasing parser cache misses for DDL and increasing size of binary
> > > > overall are acceptable costs if we are able to swap out the unneeded
> > > > areas and significantly reduce the cache misses on the well travelled
> > > > portions of the parser.
> > >
> > > I generally agree.  We don't want to bloat the size of the parser with
> > > wild abandon, but yeah if we can reduce the cache misses on the
> > > well-travelled portions that seems like it ought to help.  My previous
> > > hacky attempt to quantify the potential benefit in this area was:
> > >
> > > http://archives.postgresql.org/pgsql-hackers/2011-05/msg01008.php
> > >
> > > On my machine there seemed to be a small but consistent win; on a very
> > > old box Jeff Janes tried, it didn't seem like there was any benefit at
> > > all.  Somehow, I have a feeling we're missing a trick here.
> >
> > I don't think you will see too many cache misses on such a low number of
> > extremly simply statements, so its not too surprising not to see a that
> > big difference with that.
> >
> > Are we sure its really cache-misses and not just the actions performed
> > in the grammar that make bison code show up in profiles? I remember the
> > latter being the case...
>
> Hm. A very, very quick perf stat -dvvv of pgbench -S -c 20 -j 20 -T 20 later:
>
>      218350.885559 task-clock                #   10.095 CPUs utilized
>          1,676,479 context-switches          #    0.008 M/sec
>              2,396 cpu-migrations            #    0.011 K/sec
>            796,038 page-faults               #    0.004 M/sec
>    506,312,525,518 cycles                    #    2.319 GHz                   
>   [20.00%]
>    405,944,435,754 stalled-cycles-frontend   #   80.18% frontend cycles idle  
>   [30.32%]
>    236,712,872,641 stalled-cycles-backend    #   46.75% backend  cycles idle  
>   [40.51%]
>    193,459,120,458 instructions              #    0.38  insns per cycle
>                                              #    2.10  stalled cycles per 
> insn [50.70%]
>     36,433,144,472 branches                  #  166.856 M/sec                 
>   [51.12%]
>      3,623,778,087 branch-misses             #    9.95% of all branches       
>   [50.87%]
>     50,344,123,581 L1-dcache-loads           #  230.565 M/sec                 
>   [50.33%]
>      5,548,192,686 L1-dcache-load-misses     #   11.02% of all L1-dcache hits 
>   [49.69%]
>      2,666,461,361 LLC-loads                 #   12.212 M/sec                 
>   [35.63%]
>        112,407,198 LLC-load-misses           #    4.22% of all LL-cache hits  
>   [ 9.67%]
>
>       21.629396701 seconds time elapsed
>
> So there definitely a noticeable rate of cache misses...

L1 misses:
# Samples: 997K of event 'L1-dcache-load-misses'
# Overhead   Command       Shared Object Symbol
# ........  ........  
...............................................................
     6.49%  postgres  postgres            [.] SearchCatCache
     3.65%  postgres  postgres            [.] base_yyparse
     3.48%  postgres  postgres            [.] hash_search_with_hash_value
     3.41%  postgres  postgres            [.] AllocSetAlloc
     1.84%  postgres  postgres            [.] LWLockAcquire
     1.40%  postgres  postgres            [.] fmgr_info_cxt_security
     1.36%  postgres  postgres            [.] nocachegetattr
     1.23%  postgres  libc-2.13.so        [.] _int_malloc
     1.20%  postgres  postgres            [.] core_yylex
     1.15%  postgres  postgres            [.] MemoryContextAllocZeroAligned
     0.94%  postgres  postgres            [.] PostgresMain
     0.94%  postgres  postgres            [.] MemoryContextAlloc
     0.91%  postgres  libc-2.13.so        [.] __memcpy_ssse3_back
     0.89%  postgres  postgres            [.] CatalogCacheComputeHashValue
     0.86%  postgres  postgres            [.] PinBuffer
     0.86%  postgres  [kernel.kallsyms]   [k] __audit_syscall_entry
     0.80%  postgres  libc-2.13.so        [.] __strcmp_sse42
     0.80%  postgres  postgres            [.] _bt_compare
     0.78%  postgres  postgres            [.] FunctionCall2Coll
     0.77%  postgres  libc-2.13.so        [.] malloc
     0.73%  postgres  libc-2.13.so        [.] __memset_sse2
     0.72%  postgres  postgres            [.] GetSnapshotData
     0.69%  postgres  [kernel.kallsyms]   [k] fget_light
     0.69%  postgres  postgres            [.] DirectFunctionCall1Coll
     0.67%  postgres  postgres            [.] hash_search
     0.67%  postgres  libc-2.13.so        [.] 0x000000000011a3a5
     0.66%  postgres  postgres            [.] pgstat_initstats
     0.66%  postgres  postgres            [.] AllocSetFree
     0.65%  postgres  libc-2.13.so        [.] __strlen_sse42
     0.60%  postgres  libc-2.13.so        [.] _int_free
     0.60%  postgres  [kernel.kallsyms]   [k] cpuacct_charge
     0.59%  postgres  postgres            [.] heap_getsysattr
     0.59%  postgres  postgres            [.] MemoryContextAllocZero
     0.58%  postgres  postgres            [.] PopActiveSnapshot
     0.53%  postgres  libc-2.13.so        [.] __memcmp_sse4_1
     0.51%  postgres  postgres            [.] ReadBuffer_common
     0.49%  postgres  postgres            [.] ScanKeywordLookup
     0.49%  postgres  postgres            [.] LockAcquireExtended
     0.47%  postgres  [kernel.kallsyms]   [k] update_cfs_shares
     0.45%  postgres  postgres            [.] SearchCatCacheList
     0.45%  postgres  postgres            [.] new_list
     0.44%  postgres  postgres            [.] get_relation_info

LLC misses:
# Samples: 1M of event 'LLC-load-misses'
# Event count (approx.): 143379713
# Overhead   Command       Shared Object Symbol
# ........  ........  
...............................................................
    25.08%  postgres  postgres            [.] _bt_compare
    12.84%  postgres  postgres            [.] PinBuffer
     9.18%  postgres  postgres            [.] LWLockAcquire
     6.31%  postgres  postgres            [.] GetSnapshotData
     6.08%  postgres  postgres            [.] heap_hot_search_buffer
     5.13%  postgres  postgres            [.] hash_search_with_hash_value
     4.85%  postgres  postgres            [.] _bt_checkpage
     3.95%  postgres  postgres            [.] _bt_moveright
     3.09%  postgres  postgres            [.] heap_page_prune_opt
     2.12%  postgres  postgres            [.] slot_deform_tuple
     1.98%  postgres  postgres            [.] LWLockRelease
     1.82%  postgres  libc-2.13.so        [.] __memcmp_sse4_1
     1.16%  postgres  postgres            [.] ExecProject
     1.16%  postgres  postgres            [.] FunctionCall2Coll
     0.94%  postgres  [kernel.kallsyms]   [k] copy_user_generic_string
     0.94%  postgres  [kernel.kallsyms]   [k] tg_load_down
     0.78%  postgres  [kernel.kallsyms]   [k] find_get_page
     0.73%  postgres  postgres            [.] ProcArrayEndTransaction
     0.73%  postgres  postgres            [.] pfree
     0.71%  postgres  postgres            [.] pgstat_report_xact_timestamp
     0.69%  postgres  postgres            [.] index_fetch_heap
     0.66%  postgres  postgres            [.] LockAcquireExtended
     0.56%  postgres  postgres            [.] LockBuffer
     0.45%  postgres  postgres            [.] slot_getsomeattrs
     0.40%  postgres  postgres            [.] _bt_readpage

So it seems L1 misses are the interesting thing wrt to parsing.

When doing a source/assembly annotation in SearchCatCache about half of
the misses are attributed to the memcpy() directly at the beginning.
In base_yyparse the three biggest offenders (15%, 10.5%, 5.58%) really
seem to be various kinds of table lookups.

So it seems to confirm the various suspicious that the table size might
be rather relevant.

Greetings,

Andres Freund

--
 Andres Freund                     http://www.2ndQuadrant.com/
 PostgreSQL Development, 24x7 Support, Training & Services


-- 
Sent via pgsql-hackers mailing list (pgsql-hackers@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-hackers

Reply via email to