Transactions to table, ChangeHistory, have recently become intermittently slow 
and is increasing becoming slower.

* No database configuration changes have been made recently
* We have run vacuum analyze
* We have tried backing up and reloading the table (data, indexes, etc)

Some transactions respond quickly (200 ms) and others take over 55 seconds (we 
cancel the query after 55 seconds - our timeout SLA). The problem has recently 
become very bad. It is the same query being issued but with different 
parameters.

If the transaction takes over 55 seconds and I run the query manually (with or 
without EXPLAIN ANALYZE) it returns quickly (a few hundred ms). In case I am 
looking at cache, I have a list of other queries (just different parameters) 
that have timed out and when I run them (without the limit even) the response 
is very timely.

Any help or insight would be great.

NOTE: our application is connecting to the database via JDBC and we are using 
PreparedStatements. I have provided full details so all information is 
available, but please let me know if any other information is needed - thx in 
advance.

p306=> EXPLAIN ANALYZE SELECT * FROM ChangeHistory WHERE Category BETWEEN 
'Employee' AND 'Employeezz' AND PrimaryKeyOfChange BETWEEN '312313' AND 
'312313!zz' ORDER BY ChgTS DESC, ChgUser DESC, Category DESC, 
PrimaryKeyOfChange DESC LIMIT 11;
                                            QUERY PLAN
------------------------------------------------------------------------------------------------------
Limit  (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 rows=11 
loops=1)
   ->  Sort  (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 
rows=11 loops=1)
         Sort Key: chgts, chguser, category, primarykeyofchange
         Sort Method: top-N heapsort  Memory: 27kB
         ->  Index Scan using changehistory_idx4 on changehistory  
(cost=0.00..33.65 rows=1 width=136) (actual time=0.046..
0.239 rows=85 loops=1)
               Index Cond: (((primarykeyofchange)::text >= '312313'::text) AND 
((primarykeyofchange)::text <= '312313!zz'::
text))
               Filter: (((category)::text >= 'Employee'::text) AND 
((category)::text <= 'Employeezz'::text))
 Total runtime: 0.328 ms
(8 rows)

>>>
History this week of counts with good response times vs timeouts.

| Date       | Success # | Time Out # | Avg. Success Secs |
|------------+-----------+------------+-------------------|
| 2016-05-09 |        18 |         31 |               7.9 |
| 2016-05-10 |        17 |         25 |              10.5 |
| 2016-05-11 |        27 |         33 |              10.1 |
| 2016-05-12 |        68 |         24 |               9.9 |


>>> Sample transaction response times

| Timestamp         | Tran ID        | Resp MS | Resp CD
--------------------+----------------+---------+--------
2016-05-10 06:20:19 | ListChangeHist | 55,023  | TIMEOUT
2016-05-10 07:47:34 | ListChangeHist | 55,017  | TIMEOUT
2016-05-10 07:48:00 | ListChangeHist |  9,866  | OK
2016-05-10 07:48:10 | ListChangeHist |  2,327  | OK
2016-05-10 07:59:23 | ListChangeHist | 55,020  | TIMEOUT
2016-05-10 08:11:20 | ListChangeHist | 55,030  | TIMEOUT
2016-05-10 08:31:45 | ListChangeHist |  4,216  | OK
2016-05-10 08:35:09 | ListChangeHist |  7,898  | OK
2016-05-10 08:36:18 | ListChangeHist |  9,810  | OK
2016-05-10 08:36:56 | ListChangeHist | 55,027  | TIMEOUT
2016-05-10 08:37:33 | ListChangeHist | 46,433  | OK
2016-05-10 08:38:09 | ListChangeHist | 55,019  | TIMEOUT
2016-05-10 08:53:43 | ListChangeHist | 55,019  | TIMEOUT
2016-05-10 09:45:09 | ListChangeHist | 55,022  | TIMEOUT
2016-05-10 09:46:13 | ListChangeHist | 55,017  | TIMEOUT
2016-05-10 09:49:27 | ListChangeHist | 55,011  | TIMEOUT
2016-05-10 09:52:12 | ListChangeHist | 55,018  | TIMEOUT
2016-05-10 09:57:42 | ListChangeHist |  9,462  | OK
2016-05-10 10:05:21 | ListChangeHist | 55,016  | TIMEOUT
2016-05-10 10:05:29 | ListChangeHist |    136  | OK
2016-05-10 10:05:38 | ListChangeHist |  1,517  | OK

Artifacts
======================

$ >uname -a
SunOS ***** 5.10 Generic_150400-30 sun4v sparc sun4v

Memory    : 254G phys mem, 207G free mem.
Processors: 32 - CPU is mostly 80% free

>>>
p306=> select version();
                                                      version
---------------------------------------------------------------------------------------------------
PostgreSQL 9.1.14 on sparc-sun-solaris2.10, compiled by gcc (GCC) 3.4.3 
(csl-sol210-3_4-branch+sol_rpath), 64-bit

>>>
p306=> \dt+ changehistory
                      List of relations
 Schema |     Name      | Type  | Owner | Size  | Description
--------+---------------+-------+-------+-------+-------------
 public | changehistory | table | p306  | 17 GB |

>>>
p306=> \di+ changehistory*
                                   List of relations
 Schema |         Name          | Type  | Owner |     Table     |  Size   | 
Description
--------+-----------------------+-------+-------+---------------+---------+-------------
 public | changehistory_idx1    | index | p306  | changehistory | 9597 MB |
 public | changehistory_idx3    | index | p306  | changehistory | 11 GB   |
 public | changehistory_idx4    | index | p306  | changehistory | 4973 MB |
 public | changehistory_pkey    | index | p306  | changehistory | 2791 MB |
 public | changehistory_search2 | index | p306  | changehistory | 9888 MB |
 public | changehistory_search3 | index | p306  | changehistory | 10 GB   |
 public | changehistory_search4 | index | p306  | changehistory | 9240 MB |
 public | changehistory_search5 | index | p306  | changehistory | 8373 MB |
(8 rows)


>>>
p306=> select count(*) from changehistory ;
   count
------------
 129,185,024

>>>
Show all (filtered)
======================================================

              name               | setting
---------------------------------+--------------------
 autovacuum                      | on
 autovacuum_analyze_scale_factor | 0.001
 autovacuum_analyze_threshold    | 500
 autovacuum_freeze_max_age       | 200000000
 autovacuum_max_workers          | 5
 autovacuum_naptime              | 1min
 autovacuum_vacuum_cost_delay    | 0
 autovacuum_vacuum_cost_limit    | -1
 autovacuum_vacuum_scale_factor  | 0.001
 autovacuum_vacuum_threshold     | 500
 bgwriter_delay                  | 200ms
 block_size                      | 8192
 check_function_bodies           | on
 checkpoint_completion_target    | 0.9
 checkpoint_segments             | 256
 checkpoint_timeout              | 1h
 checkpoint_warning              | 30s
 client_encoding                 | UTF8
 commit_delay                    | 0
 commit_siblings                 | 5
 cpu_index_tuple_cost            | 0.005
 cpu_operator_cost               | 0.0025
 cpu_tuple_cost                  | 0.01
 cursor_tuple_fraction           | 0.1
 deadlock_timeout                | 1s
 default_statistics_target       | 100
 default_transaction_deferrable  | off
 default_transaction_isolation   | read committed
 default_transaction_read_only   | off
 default_with_oids               | off
 effective_cache_size            | 8GB
 from_collapse_limit             | 8
 fsync                           | on
 full_page_writes                | on
 ignore_system_indexes           | off
 join_collapse_limit             | 8
 krb_caseins_users               | off
 lo_compat_privileges            | off
 maintenance_work_mem            | 1GB
 max_connections                 | 350
 max_files_per_process           | 1000
 max_function_args               | 100
 max_identifier_length           | 63
 max_index_keys                  | 32
 max_locks_per_transaction       | 64
 max_pred_locks_per_transaction  | 64
 max_prepared_transactions       | 0
 max_stack_depth                 | 2MB
 max_wal_senders                 | 5
 random_page_cost                | 4
 segment_size                    | 1GB
 seq_page_cost                   | 1
 server_encoding                 | UTF8
 server_version                  | 9.1.14
 shared_buffers                  | 2GB
 sql_inheritance                 | on
 statement_timeout               | 0
 synchronize_seqscans            | on
 synchronous_commit              | on
 synchronous_standby_names       |
 tcp_keepalives_count            | 0
 tcp_keepalives_idle             | -1
 tcp_keepalives_interval         | 0
 track_activities                | on
 track_activity_query_size       | 1024
 track_counts                    | on
 track_functions                 | none
 transaction_deferrable          | off
 transaction_isolation           | read committed
 transaction_read_only           | off
 transform_null_equals           | off
 update_process_title            | on
 vacuum_cost_delay               | 0
 vacuum_cost_limit               | 200
 vacuum_cost_page_dirty          | 20
 vacuum_cost_page_hit            | 1
 vacuum_cost_page_miss           | 10
 vacuum_defer_cleanup_age        | 0
 vacuum_freeze_min_age           | 50000000
 vacuum_freeze_table_age         | 150000000

John Gorman | Manager of Production Support, Architecture, Release Engineering 
| Eldorado | a Division of MPHASIS | www.eldoinc.com/ |
5353 North 16th Street, Suite 400, Phoenix, Arizona 85016-3228 | Tel 
602.604.3100 | Fax: 602.604.3115



Reply via email to