After I did an svn up and started a new crawl, I'm still getting a lot
of these. I will analyze futher by running EXPLAIN ANALYZE. The job
stops for some minutes and then continues.
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Found a long-running
query (62339 ms): [SELECT
t0.id,t0.jobid,t0.dochash,t0.docid,t0.status,t0.failtime,t0.failcount,t0.priorityset
FROM jobqueue t0 WHERE t0.status IN (?,?) AND t0.checkaction=? AND
t0.checktime<=? AND EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
(?,?) AND t1.id=t0.jobid AND t1.priority=?) AND NOT EXISTS(SELECT 'x'
FROM jobqueue t2 WHERE t2.dochash=t0.dochash AND t2.status IN
(?,?,?,?,?,?) AND t2.jobid!=t0.jobid) AND NOT EXISTS(SELECT 'x' FROM
prereqevents t3,events t4 WHERE t0.id=t3.owner AND t3.eventname=t4.name)
ORDER BY t0.docpriority ASC LIMIT 100]
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 0: 'P'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 1: 'G'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 2: 'R'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 3:
'1366892554745'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 4: 'A'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 5: 'a'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 6: '5'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 7: 'A'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 8: 'F'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 9: 'a'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 10: 'f'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 11: 'D'
WARN 2013-04-25 14:23:37,175 (Stuffer thread) - Parameter 12: 'd'
WARN 2013-04-25 14:32:33,766 (Stuffer thread) - Plan: Limit
(cost=9603.72..9603.72 rows=1 width=155) (actual
time=536587.381..536587.586 rows=100 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan: -> Sort
(cost=9603.72..9603.72 rows=1 width=155) (actual
time=536587.379..536587.449 rows=100 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan: Sort
Key: t0.docpriority
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan: Sort
Method: top-N heapsort Memory: 51kB
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan: ->
Nested Loop Anti Join (cost=17.68..9603.71 rows=1 width=155) (actual
time=1.565..536578.061 rows=1592 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Join Filter: (t2.jobid <> t0.jobid)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Nested Loop Semi Join (cost=17.68..9595.09 rows=1 width=155)
(actual time=1.551..536533.696 rows=1592 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Join Filter: (t0.jobid = t1.id)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Nested Loop Anti Join (cost=17.68..9593.04 rows=1 width=155)
(actual time=1.541..536504.762 rows=1592 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Join Filter: (t0.id = t3.owner)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Bitmap Heap Scan on jobqueue t0 (cost=4.53..6.54 rows=1
width=155) (actual time=1.516..8.783 rows=1592 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Recheck Cond: (status = ANY ('{P,G}'::bpchar[]))
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Filter: ((checktime <= 1366892554745::bigint) AND
(checkaction = 'R'::bpchar))
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Bitmap Index Scan on i1362584563122
(cost=0.00..4.53 rows=1 width=0) (actual time=1.375..1.375 rows=8931
loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Index Cond: (status = ANY ('{P,G}'::bpchar[]))
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Hash Join (cost=13.15..7637.41 rows=231632 width=8)
(actual time=336.989..336.989 rows=0 loops=1592)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Hash Cond: ((t3.eventname)::text = (t4.name)::text)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Seq Scan on prereqevents t3 (cost=0.00..4439.32
rows=231632 width=37) (actual time=0.006..160.286 rows=231491 loops=1592)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Hash (cost=11.40..11.40 rows=140 width=516)
(actual time=0.005..0.005 rows=0 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Buckets: 1024 Batches: 1 Memory Usage: 0kB
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Seq Scan on events t4 (cost=0.00..11.40
rows=140 width=516) (actual time=0.003..0.003 rows=0 loops=1)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Seq Scan on jobs t1 (cost=0.00..2.03 rows=2 width=8) (actual
time=0.008..0.008 rows=1 loops=1592)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Filter: ((status = ANY ('{A,a}'::bpchar[])) AND (priority =
5::bigint))
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
-> Index Scan using i1362584563120 on jobqueue t2 (cost=0.00..4.30
rows=1 width=49) (actual time=0.020..0.020 rows=0 loops=1592)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Index Cond: ((dochash)::text = (t0.dochash)::text)
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan:
Filter: (status = ANY ('{A,F,a,f,D,d}'::bpchar[]))
WARN 2013-04-25 14:32:33,767 (Stuffer thread) - Plan: Total runtime:
536587.771 ms
WARN 2013-04-25 14:32:33,767 (Stuffer thread) -
On 24.04.13 11.29, Karl Wright wrote:
I updated the ticket as well; all the EXPLAIN queries are very fast at
the moment. The reason is that the checktime value you used needs to be
the current time or larger, which is why I picked
1466628406182
Karl
On Wed, Apr 24, 2013 at 4:53 AM, Erlend Garåsen <[email protected]
<mailto:[email protected]>> wrote:
Thanks Karl!
I have updated the ticket with the information you requested, but
there is an email delay at the moment from Jira (or this list).
Erlend
On 24.04.13 01.37 <tel:24.04.13%2001.37>, Karl Wright wrote:
Hi Erlend,
I had a conversation with the PostgreSQL people. Have a look at
ticket
CONNECTORS-678; you'll want to get an EXPLAIN ANALYZE of the query I
provided before on the live system - but update the checktime
value so
it catches more than zero records:
EXPLAIN ANALYZE SELECT t0.id <http://t0.id>
<http://t0.id>,t0.jobid,t0.__dochash,t0.
docid,t0.status,t0.failtime,__t0.failcount,t0.priorityset FROM
jobqueue
t0 WHERE t0.status IN ('P','G') AND t0.checkaction='R' AND
t0.checktime<=1566628406182 AND
EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
('A','a') AND
t1.id <http://t1.id> <http://t1.id>=t0.jobid AND t1.priority=5) AND
NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
t2.dochash=t0.dochash AND
t2.status IN ('A','F','a','f','D','d') AND t2.jobid!=t0.jobid) AND
NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
t0.id <http://t0.id>
<http://t0.id>=t3.owner AND t3.eventname=t4.name
<http://t4.name> <http://t4.name>) ORDER
BY t0.docpriority ASC LIMIT 100;
I'd also like to see what the contents of your prereqevents
table looks
like on that system. If you can do SELECT * from prereqevents;
, and
give me the first ten or so that pop up, I can take it from there.
Karl
On Tue, Apr 23, 2013 at 10:22 AM, Karl Wright
<[email protected] <mailto:[email protected]>
<mailto:[email protected] <mailto:[email protected]>>> wrote:
Hi Erlend,
If you have access to psql and can connect to the production
database, I'd like to explore whether the relatively recent
change I
made for MySQL is in fact breaking PostgreSQL stuffer
thread queries.
First, read this page:
http://www.postgresql.org/__docs/current/static/indexes-__ordering.html
<http://www.postgresql.org/docs/current/static/indexes-ordering.html>
What we're trying to achieve is that the planner use the
index whose
first column is docpriority. The way we will need to
determine that
is to open up psql, and "explain" the query in question,
and then
modify the query to see if it begins to plan correctly.
In psql, you will need to assess the query in question. To
do that,
execute the following:
EXPLAIN SELECT t0.id <http://t0.id>
<http://t0.id>,t0.jobid,t0.__dochash,t0.docid,t0.status,t0.__failtime,t0.failcount,t0.__priorityset
FROM jobqueue t0 WHERE t0.status IN ('P','G') AND
t0.checkaction='R' AND t0.checktime<=1366628406182 AND
EXISTS(SELECT 'x' FROM jobs t1 WHERE t1.status IN
('A','a') AND
t1.id <http://t1.id> <http://t1.id>=t0.jobid AND t1.priority=5) AND
NOT EXISTS(SELECT 'x' FROM jobqueue t2 WHERE
t2.dochash=t0.dochash AND t2.status IN
('A','F','a','f','D','d') AND
t2.jobid!=t0.jobid) AND
NOT EXISTS(SELECT 'x' FROM prereqevents t3,events t4 WHERE
t0.id <http://t0.id> <http://t0.id>=t3.owner AND
t3.eventname=t4.name <http://t4.name>
<http://t4.name>) ORDER BY t0.docpriority ASC LIMIT 100;
The production job MUST be running or it is a meaningless
exercise.
What you want to see is that the outermost action is NOT a
sort;
instead, you want an
Index Scan using i1362584563121
If you get something else, try changing the ORDER BY clause
to includeNULLS LAST orNULLS FIRST to see if that
changes anything. If not, I may ask you to execute a
sample query or two and tell me what it returns, so I can assess
whether the data is messed up in some way.
Karl
On Tue, Apr 23, 2013 at 10:03 AM, Erlend Garåsen
<[email protected] <mailto:[email protected]>
<mailto:[email protected].__no
<mailto:[email protected]>>> wrote:
On 23.04.13 15.30, Karl Wright wrote:
What version of Postgresql is this? Given the same
Postgresql version,
9.1.9
"allow_system_table_mods";"____off"
"application_name";"pgAdmin III - Query Tool"
"archive_command";"/local/opt/____pgsql-9.1/bin/pgclarchive___wal.__sh
-P %p -F %f -S dbpg-search"
"archive_mode";"on"
"archive_timeout";"0"
"array_nulls";"on"
"authentication_timeout";"____1min"
"autovacuum";"on"
"autovacuum_analyze_scale_____factor";"0.1"
"autovacuum_analyze_threshold"____;"50"
"autovacuum_freeze_max_age";"____200000000"
"autovacuum_max_workers";"3"
"autovacuum_naptime";"1min"
"autovacuum_vacuum_cost_delay"____;"20ms"
"autovacuum_vacuum_cost_limit"____;"-1"
"autovacuum_vacuum_scale_____factor";"0.2"
"autovacuum_vacuum_threshold";____"50"
"backslash_quote";"safe_____encoding"
"bgwriter_delay";"200ms"
"bgwriter_lru_maxpages";"100"
"bgwriter_lru_multiplier";"2"
"block_size";"8192"
"bonjour";"off"
"bonjour_name";""
"bytea_output";"escape"
"check_function_bodies";"on"
"checkpoint_completion_target"____;"0.5"
"checkpoint_segments";"128"
"checkpoint_timeout";"5min"
"checkpoint_warning";"30s"
"client_encoding";"UNICODE"
"client_min_messages";"notice"
"commit_delay";"0"
"commit_siblings";"5"
"constraint_exclusion";"____partition"
"cpu_index_tuple_cost";"0.005"
"cpu_operator_cost";"0.0025"
"cpu_tuple_cost";"0.01"
"cursor_tuple_fraction";"0.1"
"custom_variable_classes";""
"DateStyle";"ISO, MDY"
"db_user_namespace";"off"
"deadlock_timeout";"1s"
"debug_assertions";"off"
"debug_pretty_print";"on"
"debug_print_parse";"off"
"debug_print_plan";"off"
"debug_print_rewritten";"off"
"default_statistics_target";"____100"
"default_tablespace";""
"default_text_search_config";"____pg_catalog.english"
"default_transaction_____deferrable";"off"
"default_transaction_____isolation";"read committed"
"default_transaction_read_____only";"off"
"default_with_oids";"off"
"effective_cache_size";"____16093MB"
"effective_io_concurrency";"1"
"enable_bitmapscan";"on"
"enable_hashagg";"on"
"enable_hashjoin";"on"
"enable_indexscan";"on"
"enable_material";"on"
"enable_mergejoin";"on"
"enable_nestloop";"on"
"enable_seqscan";"on"
"enable_sort";"on"
"enable_tidscan";"on"
"escape_string_warning";"on"
"exit_on_error";"off"
"extra_float_digits";"0"
"from_collapse_limit";"8"
"fsync";"on"
"full_page_writes";"on"
"geqo";"on"
"geqo_effort";"5"
"geqo_generations";"0"
"geqo_pool_size";"0"
"geqo_seed";"0"
"geqo_selection_bias";"2"
"geqo_threshold";"12"
"gin_fuzzy_search_limit";"0"
"hot_standby";"off"
"hot_standby_feedback";"off"
"ignore_system_indexes";"off"
"integer_datetimes";"on"
"IntervalStyle";"postgres"
"join_collapse_limit";"8"
"krb_caseins_users";"off"
"krb_srvname";"postgres"
"lc_collate";"C"
"lc_ctype";"C"
"lc_messages";"C"
"lc_monetary";"C"
"lc_numeric";"C"
"lc_time";"C"
"listen_addresses";"dbpg-____search.uio.no
<http://dbpg-__search.uio.no>
<http://dbpg-search.uio.no>"
"lo_compat_privileges";"off"
"local_preload_libraries";""
"log_autovacuum_min_duration";____"-1"
"log_checkpoints";"off"
"log_connections";"off"
"log_destination";"stderr"
"log_disconnections";"off"
"log_duration";"off"
"log_error_verbosity";"____default"
"log_executor_stats";"off"
"log_file_mode";"0600"
"log_hostname";"off"
"log_line_prefix";"[%m] [dbpg-search] [%d:%u] [%r] [%p]
[%x] "
"log_lock_waits";"off"
"log_min_duration_statement";"____-1"
"log_min_error_statement";"____info"
"log_min_messages";"notice"
"log_parser_stats";"off"
"log_planner_stats";"off"
"log_rotation_age";"0"
"log_rotation_size";"0"
"log_statement";"none"
"log_statement_stats";"off"
"log_temp_files";"-1"
"log_timezone";"Europe/Oslo"
"log_truncate_on_rotation";"____on"
"logging_collector";"on"
"maintenance_work_mem";"128MB"
"max_connections";"600"
"max_files_per_process";"1000"
"max_function_args";"100"
"max_identifier_length";"63"
"max_index_keys";"32"
"max_locks_per_transaction";"____64"
"max_pred_locks_per_____transaction";"64"
"max_prepared_transactions";"____0"
"max_stack_depth";"4MB"
"max_standby_archive_delay";"____30s"
"max_standby_streaming_delay";____"30s"
"max_wal_senders";"0"
"password_encryption";"on"
"port";"5432"
"post_auth_delay";"0"
"pre_auth_delay";"0"
"quote_all_identifiers";"off"
"random_page_cost";"2"
"replication_timeout";"1min"
"restart_after_crash";"on"
"search_path";""$user",public"
"segment_size";"1GB"
"seq_page_cost";"1"
"server_encoding";"UTF8"
"server_version";"9.1.9"
"server_version_num";"90109"
"session_replication_role";"____origin"
"shared_buffers";"384MB"
"silent_mode";"off"
"sql_inheritance";"on"
"ssl";"on"
"ssl_renegotiation_limit";"____512MB"
"standard_conforming_strings";____"off"
"statement_timeout";"0"
"superuser_reserved_____connections";"3"
"synchronize_seqscans";"on"
"synchronous_commit";"on"
"synchronous_standby_names";""
"syslog_facility";"local0"
"syslog_ident";"postgres"
"tcp_keepalives_count";"9"
"tcp_keepalives_idle";"7200"
"tcp_keepalives_interval";"75"
"temp_buffers";"8MB"
"temp_tablespaces";""
"TimeZone";"Europe/Oslo"
"timezone_abbreviations";"____Default"
"trace_notify";"off"
"trace_recovery_messages";"____log"
"trace_sort";"off"
"track_activities";"on"
"track_activity_query_size";"____1024"
"track_counts";"on"
"track_functions";"none"
"transaction_deferrable";"off"
"transaction_isolation";"read committed"
"transaction_read_only";"off"
"transform_null_equals";"off"
"unix_socket_group";""
"unix_socket_permissions";"____0700"
"update_process_title";"on"
"vacuum_cost_delay";"0"
"vacuum_cost_limit";"200"
"vacuum_cost_page_dirty";"20"
"vacuum_cost_page_hit";"1"
"vacuum_cost_page_miss";"10"
"vacuum_defer_cleanup_age";"0"
"vacuum_freeze_min_age";"____50000000"
"vacuum_freeze_table_age";"____150000000"
"wal_block_size";"8192"
"wal_buffers";"12MB"
"wal_keep_segments";"0"
"wal_level";"archive"
"wal_receiver_status_interval"____;"10s"
"wal_segment_size";"16MB"
"wal_sender_delay";"1s"
"wal_sync_method";"fdatasync"
"wal_writer_delay";"200ms"
"work_mem";"16MB"
"xmlbinary";"base64"
"xmloption";"content"
"zero_damaged_pages";"off"
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>
<tel:%28%2B47%29%2022840193>, Fax: (+47)
22852970 <tel:%28%2B47%29%2022852970>, Mobile: (+47)
91380968 <tel:%28%2B47%29%2091380968>
<tel:%28%2B47%29%2091380968>, VIP: 31050
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193 <tel:%28%2B47%29%2022840193>, Fax: (+47) 22852970
<tel:%28%2B47%29%2022852970>, Mobile: (+47) 91380968
<tel:%28%2B47%29%2091380968>, VIP: 31050
--
Erlend Garåsen
Center for Information Technology Services
University of Oslo
P.O. Box 1086 Blindern, N-0317 OSLO, Norway
Ph: (+47) 22840193, Fax: (+47) 22852970, Mobile: (+47) 91380968, VIP: 31050