Transactions to table, ChangeHistory, have recently become intermittently slow and is increasing becoming slower.
* No database configuration changes have been made recently * We have run vacuum analyze * We have tried backing up and reloading the table (data, indexes, etc) Some transactions respond quickly (200 ms) and others take over 55 seconds (we cancel the query after 55 seconds - our timeout SLA). The problem has recently become very bad. It is the same query being issued but with different parameters. If the transaction takes over 55 seconds and I run the query manually (with or without EXPLAIN ANALYZE) it returns quickly (a few hundred ms). In case I am looking at cache, I have a list of other queries (just different parameters) that have timed out and when I run them (without the limit even) the response is very timely. Any help or insight would be great. NOTE: our application is connecting to the database via JDBC and we are using PreparedStatements. I have provided full details so all information is available, but please let me know if any other information is needed - thx in advance. p306=> EXPLAIN ANALYZE SELECT * FROM ChangeHistory WHERE Category BETWEEN 'Employee' AND 'Employeezz' AND PrimaryKeyOfChange BETWEEN '312313' AND '312313!zz' ORDER BY ChgTS DESC, ChgUser DESC, Category DESC, PrimaryKeyOfChange DESC LIMIT 11; QUERY PLAN ------------------------------------------------------------------------------------------------------ Limit (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 rows=11 loops=1) -> Sort (cost=33.66..33.67 rows=1 width=136) (actual time=0.297..0.297 rows=11 loops=1) Sort Key: chgts, chguser, category, primarykeyofchange Sort Method: top-N heapsort Memory: 27kB -> Index Scan using changehistory_idx4 on changehistory (cost=0.00..33.65 rows=1 width=136) (actual time=0.046.. 0.239 rows=85 loops=1) Index Cond: (((primarykeyofchange)::text >= '312313'::text) AND ((primarykeyofchange)::text <= '312313!zz':: text)) Filter: (((category)::text >= 'Employee'::text) AND ((category)::text <= 'Employeezz'::text)) Total runtime: 0.328 ms (8 rows) >>> History this week of counts with good response times vs timeouts. | Date | Success # | Time Out # | Avg. Success Secs | |------------+-----------+------------+-------------------| | 2016-05-09 | 18 | 31 | 7.9 | | 2016-05-10 | 17 | 25 | 10.5 | | 2016-05-11 | 27 | 33 | 10.1 | | 2016-05-12 | 68 | 24 | 9.9 | >>> Sample transaction response times | Timestamp | Tran ID | Resp MS | Resp CD --------------------+----------------+---------+-------- 2016-05-10 06:20:19 | ListChangeHist | 55,023 | TIMEOUT 2016-05-10 07:47:34 | ListChangeHist | 55,017 | TIMEOUT 2016-05-10 07:48:00 | ListChangeHist | 9,866 | OK 2016-05-10 07:48:10 | ListChangeHist | 2,327 | OK 2016-05-10 07:59:23 | ListChangeHist | 55,020 | TIMEOUT 2016-05-10 08:11:20 | ListChangeHist | 55,030 | TIMEOUT 2016-05-10 08:31:45 | ListChangeHist | 4,216 | OK 2016-05-10 08:35:09 | ListChangeHist | 7,898 | OK 2016-05-10 08:36:18 | ListChangeHist | 9,810 | OK 2016-05-10 08:36:56 | ListChangeHist | 55,027 | TIMEOUT 2016-05-10 08:37:33 | ListChangeHist | 46,433 | OK 2016-05-10 08:38:09 | ListChangeHist | 55,019 | TIMEOUT 2016-05-10 08:53:43 | ListChangeHist | 55,019 | TIMEOUT 2016-05-10 09:45:09 | ListChangeHist | 55,022 | TIMEOUT 2016-05-10 09:46:13 | ListChangeHist | 55,017 | TIMEOUT 2016-05-10 09:49:27 | ListChangeHist | 55,011 | TIMEOUT 2016-05-10 09:52:12 | ListChangeHist | 55,018 | TIMEOUT 2016-05-10 09:57:42 | ListChangeHist | 9,462 | OK 2016-05-10 10:05:21 | ListChangeHist | 55,016 | TIMEOUT 2016-05-10 10:05:29 | ListChangeHist | 136 | OK 2016-05-10 10:05:38 | ListChangeHist | 1,517 | OK Artifacts ====================== $ >uname -a SunOS ***** 5.10 Generic_150400-30 sun4v sparc sun4v Memory : 254G phys mem, 207G free mem. Processors: 32 - CPU is mostly 80% free >>> p306=> select version(); version --------------------------------------------------------------------------------------------------- PostgreSQL 9.1.14 on sparc-sun-solaris2.10, compiled by gcc (GCC) 3.4.3 (csl-sol210-3_4-branch+sol_rpath), 64-bit >>> p306=> \dt+ changehistory List of relations Schema | Name | Type | Owner | Size | Description --------+---------------+-------+-------+-------+------------- public | changehistory | table | p306 | 17 GB | >>> p306=> \di+ changehistory* List of relations Schema | Name | Type | Owner | Table | Size | Description --------+-----------------------+-------+-------+---------------+---------+------------- public | changehistory_idx1 | index | p306 | changehistory | 9597 MB | public | changehistory_idx3 | index | p306 | changehistory | 11 GB | public | changehistory_idx4 | index | p306 | changehistory | 4973 MB | public | changehistory_pkey | index | p306 | changehistory | 2791 MB | public | changehistory_search2 | index | p306 | changehistory | 9888 MB | public | changehistory_search3 | index | p306 | changehistory | 10 GB | public | changehistory_search4 | index | p306 | changehistory | 9240 MB | public | changehistory_search5 | index | p306 | changehistory | 8373 MB | (8 rows) >>> p306=> select count(*) from changehistory ; count ------------ 129,185,024 >>> Show all (filtered) ====================================================== name | setting ---------------------------------+-------------------- autovacuum | on autovacuum_analyze_scale_factor | 0.001 autovacuum_analyze_threshold | 500 autovacuum_freeze_max_age | 200000000 autovacuum_max_workers | 5 autovacuum_naptime | 1min autovacuum_vacuum_cost_delay | 0 autovacuum_vacuum_cost_limit | -1 autovacuum_vacuum_scale_factor | 0.001 autovacuum_vacuum_threshold | 500 bgwriter_delay | 200ms block_size | 8192 check_function_bodies | on checkpoint_completion_target | 0.9 checkpoint_segments | 256 checkpoint_timeout | 1h checkpoint_warning | 30s client_encoding | UTF8 commit_delay | 0 commit_siblings | 5 cpu_index_tuple_cost | 0.005 cpu_operator_cost | 0.0025 cpu_tuple_cost | 0.01 cursor_tuple_fraction | 0.1 deadlock_timeout | 1s default_statistics_target | 100 default_transaction_deferrable | off default_transaction_isolation | read committed default_transaction_read_only | off default_with_oids | off effective_cache_size | 8GB from_collapse_limit | 8 fsync | on full_page_writes | on ignore_system_indexes | off join_collapse_limit | 8 krb_caseins_users | off lo_compat_privileges | off maintenance_work_mem | 1GB max_connections | 350 max_files_per_process | 1000 max_function_args | 100 max_identifier_length | 63 max_index_keys | 32 max_locks_per_transaction | 64 max_pred_locks_per_transaction | 64 max_prepared_transactions | 0 max_stack_depth | 2MB max_wal_senders | 5 random_page_cost | 4 segment_size | 1GB seq_page_cost | 1 server_encoding | UTF8 server_version | 9.1.14 shared_buffers | 2GB sql_inheritance | on statement_timeout | 0 synchronize_seqscans | on synchronous_commit | on synchronous_standby_names | tcp_keepalives_count | 0 tcp_keepalives_idle | -1 tcp_keepalives_interval | 0 track_activities | on track_activity_query_size | 1024 track_counts | on track_functions | none transaction_deferrable | off transaction_isolation | read committed transaction_read_only | off transform_null_equals | off update_process_title | on vacuum_cost_delay | 0 vacuum_cost_limit | 200 vacuum_cost_page_dirty | 20 vacuum_cost_page_hit | 1 vacuum_cost_page_miss | 10 vacuum_defer_cleanup_age | 0 vacuum_freeze_min_age | 50000000 vacuum_freeze_table_age | 150000000 John Gorman | Manager of Production Support, Architecture, Release Engineering | Eldorado | a Division of MPHASIS | www.eldoinc.com/ | 5353 North 16th Street, Suite 400, Phoenix, Arizona 85016-3228 | Tel 602.604.3100 | Fax: 602.604.3115