Hi Erlend, The lock-up seems to be due to the throttling code getting stuck. This code has been largely unchanged for many versions, so I think it is likely to happen because of changes in the jvm's optimization algorithms. I've checked in a fix which should hopefully stop that from happening to trunk. See CONNECTORS-679.
Karl On Wed, Apr 24, 2013 at 5:29 AM, Karl Wright <[email protected]> wrote: > I updated the ticket as well; all the EXPLAIN queries are very fast at the > moment. The reason is that the checktime value you used needs to be the > current time or larger, which is why I picked > > 1466628406182 > > Karl > > > > On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen > <[email protected]>wrote: > >> >> Thanks Karl! >> >> I have updated the ticket with the information you requested, but there >> is an email delay at the moment from Jira (or this list). >> >> Erlend >> >> >> On 24.04.13 01.37, Karl Wright wrote: >> >>> Hi Erlend, >>> >>> I had a conversation with the PostgreSQL people. Have a look at ticket >>> CONNECTORS-678; you'll want to get an EXPLAIN ANALYZE of the query I >>> provided before on the live system - but update the checktime value so >>> it catches more than zero records: >>> >>> EXPLAIN ANALYZE SELECT t0.id <http://t0.id>,t0.jobid,t0.**dochash,t0. >>> >>> docid,t0.status,t0.failtime,**t0.failcount,t0.priorityset FROM jobqueue >>> t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND >>> t0.checktime<=1566628406182 AND >>> EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND >>> t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND >>> >>> NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND >>> t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND >>> NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE t0.id >>> <http://t0.id>=t3.owner AND t3.eventname=t4.name <http://t4.name>) ORDER >>> >>> BY t0.docpriority ASC LIMIT 100; >>> >>> I'd also like to see what the contents of your prereqevents table looks >>> like on that system. If you can do SELECT * from prereqevents; , and >>> give me the first ten or so that pop up, I can take it from there. >>> >>> Karl >>> >>> >>> >>> On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright <[email protected] >>> <mailto:[email protected]>> wrote: >>> >>> Hi Erlend, >>> >>> If you have access to psql and can connect to the production >>> database, I'd like to explore whether the relatively recent change I >>> made for MySQL is in fact breaking PostgreSQL stuffer thread queries. >>> >>> First, read this page: >>> http://www.postgresql.org/**docs/current/static/indexes-** >>> ordering.html<http://www.postgresql.org/docs/current/static/indexes-ordering.html> >>> >>> What we're trying to achieve is that the planner use the index whose >>> first column is docpriority. The way we will need to determine that >>> is to open up psql, and "explain" the query in question, and then >>> modify the query to see if it begins to plan correctly. >>> >>> In psql, you will need to assess the query in question. To do that, >>> execute the following: >>> >>> EXPLAIN SELECT t0.id >>> <http://t0.id>,t0.jobid,t0.**dochash,t0.docid,t0.status,t0.** >>> failtime,t0.failcount,t0.**priorityset >>> >>> FROM jobqueue t0 WHERE t0.status IN ('P','G') AND >>> t0.checkaction='R' AND t0.checktime<=1366628406182 AND >>> EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN ('A','a') AND >>> t1.id <http://t1.id>=t0.jobid AND t1.priority=5) AND >>> >>> NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE >>> t2.dochash=t0.dochash AND t2.status IN ('A','F','a','f','D','d') AND >>> t2.jobid!=t0.jobid) AND >>> NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE >>> t0.id <http://t0.id>=t3.owner AND t3.eventname=t4.name >>> <http://t4.name>) ORDER BY t0.docpriority ASC LIMIT 100; >>> >>> >>> >>> The production job MUST be running or it is a meaningless exercise. >>> >>> What you want to see is that the outermost action is NOT a sort; >>> instead, you want an >>> >>> Index Scan using i1362584563121 >>> >>> >>> If you get something else, try changing the ORDER BY clause to >>> includeNULLS LAST orNULLS FIRST to see if that >>> >>> >>> changes anything. If not, I may ask you to execute a sample query >>> or two and tell me what it returns, so I can assess >>> >>> whether the data is messed up in some way. >>> >>> >>> >>> Karl >>> >>> >>> >>> >>> >>> >>> On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen >>> <[email protected] >>> <mailto:[email protected].**no<[email protected]>>> >>> wrote: >>> >>> On 23.04.13 15.30, Karl Wright wrote: >>> >>> What version of Postgresql is this? Given the same >>> Postgresql version, >>> >>> >>> 9.1.9 >>> >>> "allow_system_table_mods";"__**off" >>> >>> "application_name";"pgAdmin III - Query Tool" >>> "archive_command";"/local/opt/**__pgsql-9.1/bin/pgclarchive_** >>> wal.__sh >>> >>> -P %p -F %f -S dbpg-search" >>> "archive_mode";"on" >>> "archive_timeout";"0" >>> "array_nulls";"on" >>> "authentication_timeout";"__**1min" >>> "autovacuum";"on" >>> "autovacuum_analyze_scale___**factor";"0.1" >>> "autovacuum_analyze_threshold"**__;"50" >>> "autovacuum_freeze_max_age";"_**_200000000" >>> "autovacuum_max_workers";"3" >>> "autovacuum_naptime";"1min" >>> "autovacuum_vacuum_cost_delay"**__;"20ms" >>> "autovacuum_vacuum_cost_limit"**__;"-1" >>> "autovacuum_vacuum_scale___**factor";"0.2" >>> "autovacuum_vacuum_threshold";**__"50" >>> "backslash_quote";"safe___**encoding" >>> >>> "bgwriter_delay";"200ms" >>> "bgwriter_lru_maxpages";"100" >>> "bgwriter_lru_multiplier";"2" >>> "block_size";"8192" >>> "bonjour";"off" >>> "bonjour_name";"" >>> "bytea_output";"escape" >>> "check_function_bodies";"on" >>> "checkpoint_completion_target"**__;"0.5" >>> >>> "checkpoint_segments";"128" >>> "checkpoint_timeout";"5min" >>> "checkpoint_warning";"30s" >>> "client_encoding";"UNICODE" >>> "client_min_messages";"notice" >>> "commit_delay";"0" >>> "commit_siblings";"5" >>> "constraint_exclusion";"__**partition" >>> >>> "cpu_index_tuple_cost";"0.005" >>> "cpu_operator_cost";"0.0025" >>> "cpu_tuple_cost";"0.01" >>> "cursor_tuple_fraction";"0.1" >>> "custom_variable_classes";"" >>> "DateStyle";"ISO, MDY" >>> "db_user_namespace";"off" >>> "deadlock_timeout";"1s" >>> "debug_assertions";"off" >>> "debug_pretty_print";"on" >>> "debug_print_parse";"off" >>> "debug_print_plan";"off" >>> "debug_print_rewritten";"off" >>> "default_statistics_target";"_**_100" >>> "default_tablespace";"" >>> "default_text_search_config";"**__pg_catalog.english" >>> "default_transaction___**deferrable";"off" >>> "default_transaction___**isolation";"read committed" >>> "default_transaction_read___**only";"off" >>> "default_with_oids";"off" >>> "effective_cache_size";"__**16093MB" >>> >>> "effective_io_concurrency";"1" >>> "enable_bitmapscan";"on" >>> "enable_hashagg";"on" >>> "enable_hashjoin";"on" >>> "enable_indexscan";"on" >>> "enable_material";"on" >>> "enable_mergejoin";"on" >>> "enable_nestloop";"on" >>> "enable_seqscan";"on" >>> "enable_sort";"on" >>> "enable_tidscan";"on" >>> "escape_string_warning";"on" >>> "exit_on_error";"off" >>> "extra_float_digits";"0" >>> "from_collapse_limit";"8" >>> "fsync";"on" >>> "full_page_writes";"on" >>> "geqo";"on" >>> "geqo_effort";"5" >>> "geqo_generations";"0" >>> "geqo_pool_size";"0" >>> "geqo_seed";"0" >>> "geqo_selection_bias";"2" >>> "geqo_threshold";"12" >>> "gin_fuzzy_search_limit";"0" >>> "hot_standby";"off" >>> "hot_standby_feedback";"off" >>> "ignore_system_indexes";"off" >>> "integer_datetimes";"on" >>> "IntervalStyle";"postgres" >>> "join_collapse_limit";"8" >>> "krb_caseins_users";"off" >>> "krb_srvname";"postgres" >>> "lc_collate";"C" >>> "lc_ctype";"C" >>> "lc_messages";"C" >>> "lc_monetary";"C" >>> "lc_numeric";"C" >>> "lc_time";"C" >>> >>> "listen_addresses";"dbpg-__**search.uio.no<http://dbpg-__search.uio.no> >>> <http://dbpg-search.uio.no>" >>> "lo_compat_privileges";"off" >>> "local_preload_libraries";"" >>> "log_autovacuum_min_duration";**__"-1" >>> >>> "log_checkpoints";"off" >>> "log_connections";"off" >>> "log_destination";"stderr" >>> "log_disconnections";"off" >>> "log_duration";"off" >>> "log_error_verbosity";"__**default" >>> >>> "log_executor_stats";"off" >>> "log_file_mode";"0600" >>> "log_hostname";"off" >>> "log_line_prefix";"[%m] [dbpg-search] [%d:%u] [%r] [%p] [%x] " >>> "log_lock_waits";"off" >>> "log_min_duration_statement";"**__-1" >>> "log_min_error_statement";"__**info" >>> >>> "log_min_messages";"notice" >>> "log_parser_stats";"off" >>> "log_planner_stats";"off" >>> "log_rotation_age";"0" >>> "log_rotation_size";"0" >>> "log_statement";"none" >>> "log_statement_stats";"off" >>> "log_temp_files";"-1" >>> "log_timezone";"Europe/Oslo" >>> "log_truncate_on_rotation";"__**on" >>> >>> "logging_collector";"on" >>> "maintenance_work_mem";"128MB" >>> "max_connections";"600" >>> "max_files_per_process";"1000" >>> "max_function_args";"100" >>> "max_identifier_length";"63" >>> "max_index_keys";"32" >>> "max_locks_per_transaction";"_**_64" >>> "max_pred_locks_per___**transaction";"64" >>> "max_prepared_transactions";"_**_0" >>> "max_stack_depth";"4MB" >>> "max_standby_archive_delay";"_**_30s" >>> "max_standby_streaming_delay";**__"30s" >>> >>> "max_wal_senders";"0" >>> "password_encryption";"on" >>> "port";"5432" >>> "post_auth_delay";"0" >>> "pre_auth_delay";"0" >>> "quote_all_identifiers";"off" >>> "random_page_cost";"2" >>> "replication_timeout";"1min" >>> "restart_after_crash";"on" >>> "search_path";""$user",public" >>> "segment_size";"1GB" >>> "seq_page_cost";"1" >>> "server_encoding";"UTF8" >>> "server_version";"9.1.9" >>> "server_version_num";"90109" >>> "session_replication_role";"__**origin" >>> >>> "shared_buffers";"384MB" >>> "silent_mode";"off" >>> "sql_inheritance";"on" >>> "ssl";"on" >>> "ssl_renegotiation_limit";"__**512MB" >>> "standard_conforming_strings";**__"off" >>> "statement_timeout";"0" >>> "superuser_reserved___**connections";"3" >>> >>> "synchronize_seqscans";"on" >>> "synchronous_commit";"on" >>> "synchronous_standby_names";"" >>> "syslog_facility";"local0" >>> "syslog_ident";"postgres" >>> "tcp_keepalives_count";"9" >>> "tcp_keepalives_idle";"7200" >>> "tcp_keepalives_interval";"75" >>> "temp_buffers";"8MB" >>> "temp_tablespaces";"" >>> "TimeZone";"Europe/Oslo" >>> "timezone_abbreviations";"__**Default" >>> "trace_notify";"off" >>> "trace_recovery_messages";"__**log" >>> "trace_sort";"off" >>> "track_activities";"on" >>> "track_activity_query_size";"_**_1024" >>> >>> "track_counts";"on" >>> "track_functions";"none" >>> "transaction_deferrable";"off" >>> "transaction_isolation";"read committed" >>> "transaction_read_only";"off" >>> "transform_null_equals";"off" >>> "unix_socket_group";"" >>> "unix_socket_permissions";"__**0700" >>> >>> "update_process_title";"on" >>> "vacuum_cost_delay";"0" >>> "vacuum_cost_limit";"200" >>> "vacuum_cost_page_dirty";"20" >>> "vacuum_cost_page_hit";"1" >>> "vacuum_cost_page_miss";"10" >>> "vacuum_defer_cleanup_age";"0" >>> "vacuum_freeze_min_age";"__**50000000" >>> "vacuum_freeze_table_age";"__**150000000" >>> >>> "wal_block_size";"8192" >>> "wal_buffers";"12MB" >>> "wal_keep_segments";"0" >>> "wal_level";"archive" >>> "wal_receiver_status_interval"**__;"10s" >>> >>> "wal_segment_size";"16MB" >>> "wal_sender_delay";"1s" >>> "wal_sync_method";"fdatasync" >>> "wal_writer_delay";"200ms" >>> "work_mem";"16MB" >>> "xmlbinary";"base64" >>> "xmloption";"content" >>> "zero_damaged_pages";"off" >>> >>> >>> >>> -- >>> Erlend Garåsen >>> Center for Information Technology Services >>> University of Oslo >>> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >>> Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax: (+47) >>> 22852970 <tel:%28%2B47%29%2022852970>, Mobile: (+47) 91380968 >>> <tel:%28%2B47%29%2091380968>, VIP: 31050 >>> >>> >>> >>> >> >> -- >> Erlend Garåsen >> Center for Information Technology Services >> University of Oslo >> P.O. Box 1086 Blindern, N-0317 OSLO, Norway >> Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: >> 31050 >> > >
