Alex Vandiver wrote: > On Tue, 2014-01-28 at 03:30 +0100, Michelle Sullivan wrote: > >> Ok so temporarily I set min_duration to 0 and logged everything, the log >> is here: >> > > The situation can be made much clearer by limiting to the postgres > process IDs which are responding to queries from RT, namely 15537 and > 15481.
( parsed this an put it in http://www.mhix.org/ticketload-rt.log ) > The longest statement duration is 1559.273ms, which is a _parse_ > of a statement query, not even an execution. The next-longest query is > DBD::Pg's query (which is only run once per process, FTR) at 02:11:09 to > determine table information -- which takes nearly a full _second_ to > return. On the largest Postgres installation which I have access to, > this takes less than 100ms. > Well here's the thing... Jan 28 02:11:09 to Jan 28 02:18:31 ... a full 7 minutes of being in ShowCustomFields there are just 100's (like more than 1900 of them) of statements like this: Jan 28 02:18:30 m0040933 postgres[15537]: [7893-1] LOG: duration: 0.042 ms bind dbdpg_p88026_1968: SELECT consrc FROM pg_catalog.pg_constraint WHERE contype = 'c' AND conrelid = 134792121 AND conkey = '{33}' Jan 28 02:18:30 m0040933 postgres[15537]: [7894-1] LOG: duration: 0.021 ms execute dbdpg_p88026_1968: SELECT consrc FROM pg_catalog.pg_constraint WHERE contype = 'c' AND conrelid = 134792121 AND conkey = '{33}' Jan 28 02:18:31 m0040933 postgres[15537]: [7895-1] LOG: duration: 0.363 ms statement: DEALLOCATE dbdpg_p88026_1968 .. and no other queries... The start being immediately after: Jan 28 02:11:09 m0040933 postgres[15537]: [218-2] SELECT Jan 28 02:11:09 m0040933 postgres[15537]: [218-3] NULL::text AS "TABLE_CAT" Jan 28 02:11:09 m0040933 postgres[15537]: [218-4] , quote_ident(n.nspname) AS "TABLE_SCHEM" Jan 28 02:11:09 m0040933 postgres[15537]: [218-5] , quote_ident(c.relname) AS "TABLE_NAME" Jan 28 02:11:09 m0040933 postgres[15537]: [218-6] , quote_ident(a.attname) AS "COLUMN_NAME" Jan 28 02:11:09 m0040933 postgres[15537]: [218-7] , a.atttypid AS "DATA_TYPE" Jan 28 02:11:09 m0040933 postgres[15537]: [218-8] , pg_catalog.format_type(a.atttypid, NULL) AS "TYPE_NAME" Jan 28 02:11:09 m0040933 postgres[15537]: [218-9] , a.attlen AS "COLUMN_SIZE" Jan 28 02:11:09 m0040933 postgres[15537]: [218-10] , NULL::text AS "BUFFER_LENGTH" Jan 28 02:11:09 m0040933 postgres[15537]: [218-11] , NULL::text AS "DECIMAL_DIGITS" Jan 28 02:11:09 m0040933 postgres[15537]: [218-12] , NULL::text AS "NUM_PREC_RADIX" Jan 28 02:11:09 m0040933 postgres[15537]: [218-13] , CASE a.attnotnull WHEN 't' THEN 0 ELSE 1 END AS "NULLABLE" Jan 28 02:11:09 m0040933 postgres[15537]: [218-14] , pg_catalog.col_description(a.attrelid, a.attnum) AS "REMARKS" Jan 28 02:11:09 m0040933 postgres[15537]: [218-15] , pg_catalog.pg_get_expr(af.adbin, af.adrelid) AS "COLUMN_DEF" Jan 28 02:11:09 m0040933 postgres[15537]: [218-16] , NULL::text AS "SQL_DATA_TYPE" Jan 28 02:11:09 m0040933 postgres[15537]: [218-17] , NULL::text AS "SQL_DATETIME_SUB" Jan 28 02:11:09 m0040933 postgres[15537]: [218-18] , NULL::text AS "CHAR_OCTET_LENGTH" Jan 28 02:11:09 m0040933 postgres[15537]: [218-19] , a.attnum AS "ORDINAL_POSITION" Jan 28 02:11:09 m0040933 postgres[15537]: [218-20] , CASE a.attnotnull WHEN 't' THEN 'NO' ELSE 'YES' END AS "IS_NULLABLE" Jan 28 02:11:09 m0040933 postgres[15537]: [218-21] , pg_catalog.format_type(a.atttypid, a.atttypmod) AS "pg_type" Jan 28 02:11:09 m0040933 postgres[15537]: [218-22] , '?' AS "pg_constraint" Jan 28 02:11:09 m0040933 postgres[15537]: [218-23] , n.nspname AS "pg_schema" Jan 28 02:11:09 m0040933 postgres[15537]: [218-24] , c.relname AS "pg_table" Jan 28 02:11:09 m0040933 postgres[15537]: [218-25] , a.attname AS "pg_column" Jan 28 02:11:09 m0040933 postgres[15537]: [218-26] , a.attrelid AS "pg_attrelid" Jan 28 02:11:09 m0040933 postgres[15537]: [218-27] , a.attnum AS "pg_attnum" Jan 28 02:11:09 m0040933 postgres[15537]: [218-28] , a.atttypmod AS "pg_atttypmod" Jan 28 02:11:09 m0040933 postgres[15537]: [218-29] , t.typtype AS "_pg_type_typtype" Jan 28 02:11:09 m0040933 postgres[15537]: [218-30] , t.oid AS "_pg_type_oid" Jan 28 02:11:09 m0040933 postgres[15537]: [218-31] FROM Jan 28 02:11:09 m0040933 postgres[15537]: [218-32] pg_catalog.pg_type t Jan 28 02:11:09 m0040933 postgres[15537]: [218-33] JOIN pg_catalog.pg_attribute a ON (t.oid = a.atttypid) Jan 28 02:11:09 m0040933 postgres[15537]: [218-34] JOIN pg_catalog.pg_class c ON (a.attrelid = c.oid) Jan 28 02:11:09 m0040933 postgres[15537]: [218-35] LEFT JOIN pg_catalog.pg_attrdef af ON (a.attnum = af.adnum AND a.attrelid = af.adrelid) Jan 28 02:11:09 m0040933 postgres[15537]: [218-36] JOIN pg_catalog.pg_namespace n ON (n.oid = c.relnamespace) Jan 28 02:11:09 m0040933 postgres[15537]: [218-37] WHERE Jan 28 02:11:09 m0040933 postgres[15537]: [218-38] a.attnum >= 0 Jan 28 02:11:09 m0040933 postgres[15537]: [218-39] AND c.relkind IN ('r','v') Jan 28 02:11:09 m0040933 postgres[15537]: [218-40] Jan 28 02:11:09 m0040933 postgres[15537]: [218-41] AND c.relname LIKE '%' Jan 28 02:11:09 m0040933 postgres[15537]: [218-42] AND a.attname LIKE '%' Jan 28 02:11:09 m0040933 postgres[15537]: [218-43] ORDER BY "TABLE_SCHEM", "TABLE_NAME", "ORDINAL_POSITION" Jan 28 02:11:09 m0040933 postgres[15537]: [218-44] Jan 28 02:11:09 m0040933 postgres[15537]: [219-1] LOG: duration: 0.665 ms statement: DEALLOCATE dbdpg_p88026_49 Which is a column_info() call in DBD::Pg Which is called from Fields() in DBIx::Searchbuilder .. Which I assume is being called in ShowCustomFields somewhere... The problem is network latency between the front ends and the backend DB... Running this on the same host returns in under a second. My frontends are on completely different networks (and indeed different data centers) from the database servers for security (mainly DDoS type) reasons Now I'd suspect something very broken possibly in column_info as pg_catalog.pg_constraint only has 36 rows in the rt4 table, and all those are the pk's.. The TABLE_CAT query returns 1921 rows, which corresponds exactly to the number of pg_catalog.pg_constraint queries... > I am not familiar with Bucardo as a replication strategy, so I can't > comment on that. I've CC'd Gregg as he's the current author for DBD::Pg, and he's also the person behind Bucardo replication. > However, I strongly suspect that the answer lies > amongst your replication tool and network topology. Replication tool - no. Network topology, maybe. > The symptoms do not > match any failure modes that I'm familiar with in RT. You may wish to > try simplifying your database configuration (temporarily install Pg on > one of the front-ends) to see if that resolves any of the problems. > Not really possible or practical - as it would appear the problem is related to network latency (of around 20ms round-trip - the query itself completes in 0.021ms - ~0.400ms if you include the bind and dealloc) and the use of column_info() calls... is there a reason why ShowCustomFields needs to call column_info()? Regards, -- Michelle Sullivan http://www.mhix.org/