Re: Properly handle OOM death?

2023-11-13 Thread Justin Pryzby
On Mon, Mar 13, 2023 at 06:43:01PM +0100, Peter J. Holzer wrote:
> On 2023-03-13 09:21:18 -0800, Israel Brewster wrote:
> > I’m running a postgresql 13 database on an Ubuntu 20.04 VM that is a bit 
> > more
> > memory constrained than I would like, such that every week or so the various
> > processes running on the machine will align badly and the OOM killer will 
> > kick
> > in, killing off postgresql, as per the following journalctl output:
> > 
> > Mar 12 04:04:23 novarupta systemd[1]: postgresql@13-main.service: A process 
> > of this unit has been killed by the OOM killer.
> > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Failed 
> > with result 'oom-kill'.
> > Mar 12 04:04:32 novarupta systemd[1]: postgresql@13-main.service: Consumed 
> > 5d 17h 48min 24.509s CPU time.
> > 
> > And the service is no longer running.
> 
> I might be misreading this, but it looks to me that systemd detects that
> *some* process in the group was killed by the oom killer and stops the
> service.

Yeah.

I found this old message on google.  I'm surprised there aren't more,
similar complaints about this.  It's as Peter said: it (sometimes)
causes systemd to actively *stop* the cluster after OOM, when it
would've come back online on its own if the init (supervisor) process
didn't interfere.

My solution was to set:
/usr/lib/systemd/system/postgresql@.service OOMPolicy=continue

I suggest that the default unit files should do likewise.

-- 
Justin




Re: Supported RHEL version for PostgreSQL 11

2021-08-28 Thread Justin Pryzby
On Sat, Aug 28, 2021 at 04:08:33PM +0800, Sudhir Guna wrote:
> The RHEL upgrade from 7.5 to 8.4 is not done as an incremental version
> upgrade rather directly from 7.5.

I think the process should be:

1) upgrade from RH7 to RH8;
2) reindex relevant indexes (see wiki link below);
3) upgrade from postgres 9.6 to 11;

You may need to install updated PGDG packages for the RH8.

How are you planning to upgrade postgres ?
Using pg_upgrade, or pg_dump, or ??

> Postgres is installed using PGDG packages.
>
> On Thu, Aug 26, 2021 at 1:15 AM Justin Pryzby  wrote:
> 
> > On Wed, Aug 25, 2021 at 11:47:57PM +0800, Sudhir Guna wrote:
> > >  Hi All,
> > >
> > > I would like to clarify on OS upgrade for PostgreSQL.
> > >
> > > Currently we are on PostgreSQL 9.6 on RHEL 7.5. We would like to do a
> > > direct upgrade from RHEL 7.5 to RHEL 8.4 and then upgrade PostgreSQL 9.6
> > to
> > > PostgreSQL 11.
> > >
> > > Is it ok to do a direct upgrade from RHEL 7.5 to RHEL 8.4
> >
> > Please describe what you mean by a "direct upgrade" ?
> >
> > You should be careful, since the sort order of indexes can change:
> > https://wiki.postgresql.org/wiki/Locale_data_changes
> > | Version 8 uses the new locale data. Therefore, caution will be necessary
> > when upgrading.
> >
> > > Is PostgreSQL 11 compatible with RHEL 8.4
> >
> > How did you install postgres ?
> > Compiled from source ?
> > Using RHEL packages ?
> > Using PGDG packages ?
> >
> > PDDG has pacakges for PG11/RH8
> > https://yum.postgresql.org/rpmchart/#pg11




Re: Supported RHEL version for PostgreSQL 11

2021-08-25 Thread Justin Pryzby
On Wed, Aug 25, 2021 at 11:47:57PM +0800, Sudhir Guna wrote:
>  Hi All,
> 
> I would like to clarify on OS upgrade for PostgreSQL.
> 
> Currently we are on PostgreSQL 9.6 on RHEL 7.5. We would like to do a
> direct upgrade from RHEL 7.5 to RHEL 8.4 and then upgrade PostgreSQL 9.6 to
> PostgreSQL 11.
> 
> Is it ok to do a direct upgrade from RHEL 7.5 to RHEL 8.4

Please describe what you mean by a "direct upgrade" ?

You should be careful, since the sort order of indexes can change:
https://wiki.postgresql.org/wiki/Locale_data_changes
| Version 8 uses the new locale data. Therefore, caution will be necessary when 
upgrading. 

> Is PostgreSQL 11 compatible with RHEL 8.4

How did you install postgres ?
Compiled from source ?
Using RHEL packages ?
Using PGDG packages ?

PDDG has pacakges for PG11/RH8
https://yum.postgresql.org/rpmchart/#pg11

-- 
Justin




Re: Row estimates for empty tables

2020-08-24 Thread Justin Pryzby
On Fri, Jul 24, 2020 at 09:14:04PM +0200, Pavel Stehule wrote:
> pá 24. 7. 2020 v 16:38 odesílatel Christophe Pettus  
> napsal:
> > Since we already special-case parent tables for partition sets, would a
> > storage parameter that lets you either tell the planner "no, really, zero
> > is reasonable here" or sets a minimum number of rows to plan for be
> > reasonable?  I happened to get bit by this tracking down an issue where
> > several tables in a large query had zero rows, and the planner's assumption
> > of a few pages worth caused some sub-optimal plans.  The performance hit
> > wasn't huge, but they were being joined to some *very* large tables, and
> > the differences added up.
> 
> I did this patch ten years ago.  GoodData application
> https://www.gooddata.com/  uses Postgres lot, and this application stores
> some results in tables (as guard against repeated calculations). Lot of
> these tables have zero or one row.
> 
> Although we ran an ANALYZE over all tables - the queries on empty tables
> had very bad plans, and I had to fix it by this patch. Another company uses
> a fake one row in table - so there is no possibility to have a really empty
> table.
> 
> It is an issue for special, not typical applications (this situation is
> typical for some OLAP patterns)  - it is not too often - but some clean
> solution (instead hacking postgres) can be nice.

On Mon, Aug 24, 2020 at 09:43:49PM +0200, Pavel Stehule wrote:
> This patch is just a workaround that works well 10 years (but for one
> special use case) - nothing more. Without this patch that application
> cannot work ever.

My own workaround was here:
https://www.postgresql.org/message-id/20200427181034.ga28...@telsasoft.com
|... 1) create an child table: CREATE TABLE x_child() INHERITS(x)
|and, 2) change the query to use "select from ONLY".
|
|(1) allows the planner to believe that the table really is empty, a conclusion
|it otherwise avoids and (2) avoids decending into the child (for which the
|planner would likewise avoid the conclusion that it's actually empty).

-- 
Justin




v12 and pg_restore -f-

2019-10-06 Thread Justin Pryzby
This is a "heads up" for others upgrading to v12.  I found a solution for our
use case, but it'd be easy to miss this, even if you read the release notes.

I saw this and updated our scripts with pg_restore -f-
https://www.postgresql.org/docs/12/release-12.html
|In pg_restore, require specification of -f - to send the dump contents to 
standard output (Euler Taveira)
|Previously, this happened by default if no destination was specified, but that 
was deemed to be unfriendly.

What I didn't realize at first is that -f- has no special meaning in v11 - it
just writes a file called ./-  And it's considered untennable to change
behavior of v11.

In our use, that was being piped to sed, which then saw nothing on its stdin
and just exits..  I changed our script to use pg_restore -f /dev/stdout, which
seems to be portable across postgres versions for the OS distribution we're
running.  Unfortunately, I can't think of anything portable across *OS* or
useful to include in documentation.  In the worst case, someone might need to
call pg_restore differently based on its version.

Justin




Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

2019-08-30 Thread Justin Pryzby
Moving this old thread to -hackers
https://www.postgresql.org/message-id/flat/20180519142603.GA30060%40telsasoft.com

I wanted to mention that this seems to still be an issue, now running pg11.5.

log_time   | 2019-08-30 23:20:00.118+10
user_name  | postgres
database   | ts
session_id | 5d69227e.235
session_line   | 1
command_tag| CLUSTER
session_start_time | 2019-08-30 23:19:58+10
error_severity | ERROR
sql_state_code | XX000
message| unexpected chunk number 1 (expected 0) for toast value 
2369261203 in pg_toast_2619
query  | CLUSTER pg_statistic USING 
pg_statistic_relid_att_inh_index
application_name   | psql

Note that my original report was for "missing" chunk during "VACUUM FULL", and
the current error is "unexpected chunk" during CLUSTER.  I imagine that's
related issue.  I haven't seen this in awhile (but stopped trying to reproduce
it long ago).  A recently-deployed update to this maintenance script is
probably why it's now doing CLUSTER.

On Fri, May 25, 2018 at 08:49:50AM -0500, Justin Pryzby wrote:
> On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> > Justin Pryzby  writes:
> > > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT 
> > > datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic 
> > > pg_attrdef pg_constraint; do echo "$db.$t..."; 
> > > PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; 
> > > done; done
> > 
> > > ...
> > > postgres.pg_statistic...
> > > postgres.pg_attrdef...
> > > postgres.pg_constraint...
> > > template1.pg_statistic...
> > > template1.pg_attrdef...
> > > template1.pg_constraint...
> > > ts.pg_statistic...
> > > ERROR:  canceling statement due to statement timeout
> > > ts.pg_attrdef...
> > > ts.pg_constraint...
> > > postgres.pg_statistic...
> > > ERROR:  missing chunk number 0 for toast value 3372855171 in pg_toast_2619
> > 
> > Hm, so was the timeout error happening every time through on that table,
> > or just occasionally, or did you provoke it somehow?  I'm wondering how
> > your 9s timeout relates to the expected completion time.
> 
> Actually statement_timeout isn't essential for this (maybe it helps to 
> triggers
> it more often - not sure).
> 
> Could you try:
> time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE 
> pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; 
> done'; date
> 
> Three servers experienced error within 30min, but one server didn't fail until
> 12h later, and a handful others still haven't failed..
> 
> Does this help at all ?
>  2018-05-24 21:57:49.98-03  | 5b075f8d.1ad1 | LOG| pryzbyj   | 
> postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
>  2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
> postgres | vacuuming "pg_toast.pg_toast_2619"
>  2018-05-24 21:57:50.09-03  | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
> postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions 
> in 99 pages
>  2018-05-24 21:57:50.12-03  | 5b075f8e.1ada | LOG| pryzbyj   | 
> postgres | statement: VACUUM FULL VERBOSE pg_statistic
>  2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO   | pryzbyj   | 
> postgres | vacuuming "pg_catalog.pg_statistic"
>  2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR  | pryzbyj   | 
> postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619
> 
> Some thing; this server has autovacuum logging, although it's not clear to me
> if that's an essential component of the problem, either:
>  2018-05-24 21:16:39.856-06 | LOG   | 5b078017.7b99 | pryzbyj   | postgres | 
> statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
>  2018-05-24 21:16:39.876-06 | LOG   | 5b078010.7968 |   |  | 
> automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1   
>  +
> |   |   |   |  | 
> pages: 0 removed, 117 r
>  2018-05-24 21:16:39.909-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
> vacuuming "pg_toast.pg_toast_2619"
>  2018-05-24 21:16:39.962-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
> "pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages
>  2018-05-24 21:16:40.025-06 | LOG   | 5b078018.7b9b | pryzbyj   | postgres | 
> statement: VACUUM FULL VERBOSE pg_statistic
>

OOM with many sorts

2019-07-08 Thread Justin Pryzby
One of our instances was killed:

| Out of memory: Kill process 27670 (postmaster) score 381 or sacrifice child
| Killed process 27670 (postmaster) total-vm:13207796kB, anon-rss:5736kB, 
file-rss:0kB, shmem-rss:12421696kB

The VM has: 32GB RAM, shared_buffers=12GB, work_mem=128MB 
Running only pg11.3

Trying to diagnose the issue by running a query on a narrower range of dates,
it looks to me that's due to many sorts nodes per worker.

Part of the query plan looks like this:

 [...]
 ->  Finalize GroupAggregate (actual rows=1788 loops=1)
   Group Key: huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   ->  Gather Merge (actual rows=56921 loops=1)
 Workers Planned: 2
 Workers Launched: 2
 ->  Sort (actual rows=18974 loops=3)
   Sort Key: huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   Sort Method: quicksort  Memory: 36499kB
   Worker 0:  Sort Method: quicksort  Memory: 40275kB
   Worker 1:  Sort Method: quicksort  Memory: 40263kB
   ->  Parallel Append (actual rows=18974 loops=3)
 ->  Partial GroupAggregate (actual rows=1783 
loops=1)
   Group Key: 
huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
   ->  Sort (actual rows=42792 loops=1)
 Sort Key: 
huawei_umts_ucell_20181217.sect_id, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text))
 Worker 0:  Sort Method: quicksort  
Memory: 44997kB
 ->  Parallel Seq Scan on 
huawei_umts_ucell_20181217 (actual rows=42792 loops=1)
   Filter: ((start_time >= 
'2018-12-01 00:00:00'::timestamp without time zone) AND (start_time < 
'2019-01-01 00:00:00'::timestamp without time zone))
 ->  Partial GroupAggregate (actual rows=1783 
loops=1)
   Group Key: 
huawei_umts_ucell_20181216.sect_id, 
(to_char(huawei_umts_ucell_20181216.start_time, '-MM'::text))
   ->  Sort (actual rows=42792 loops=1)
 Sort Key: 
huawei_umts_ucell_20181216.sect_id, 
(to_char(huawei_umts_ucell_20181216.start_time, '-MM'::text))
 Worker 1:  Sort Method: quicksort  
Memory: 44997kB
 ->  Parallel Seq Scan on 
huawei_umts_ucell_20181216 (actual rows=42792 loops=1)
   Filter: ((start_time >= 
'2018-12-01 00:00:00'::timestamp without time zone) AND (start_time < 
'2019-01-01 00:00:00'::timestamp without time zone))

 [...many partial GroupAggregate/Sort repeated here 
for ~200 child tables...]

This isn't the first time I've had to reduce work_mem on a parallel query to
avoid OOM, but it seems unreasonable to be concerned with 50MB sorts.

It looks like the results of each Sort node stay in RAM, during processing of
additional sort nodes (is that required?)

I SET max_parallel_workers_per_gather=0 and dumped mem context:
...
PortalContext: 203948152 total in 25756 blocks; 176600 free (319 chunks); 
203771552 used:
...
Grand total: 1918118632 bytes in 30716 blocks; 192472208 free (3685 chunks); 
1725646424 used

The total shows ~1700MB but the largest context is only 200MB.
"Caller tuples" accounts for 1400MB.

awk '/Caller tuples/{s+=$3}END{print s/1024^3}' /tmp/mem-context
1.44043

I'm attaching full plan and mem context.

It doesn't seem to be changed under pg12b2, which is concerning since PG12
advertizes that it gracefully supports "thousands" of partitions.  It seems to
me that would require tiny work_mem, which would be devastating to some
workloads.  This is not a contrived test case, it's one of our reports run
across ~10 months of history with weekly granularity.

Thanks in advance for any advice.

Justin

QUERY PLAN  
  
--
 GroupAggregate (actual rows=277 loops=1)
   Group Key: sites.site_office, sites.site_location, 
(to_char(huawei_umts_ucell_20181217.start_time, '-MM'::text)), 
sites.site_name
   ->  Sort (actual rows=1783 loops=1)
 Sort Key: sites.site_office, sites.site_location, 

distinguish update from insert (on conflict)

2019-05-21 Thread Justin Pryzby
Is it still impossible to distinguish whether a row was inserted vs updated ?

The latest I can see is here:
https://wiki.postgresql.org/wiki/UPSERT#RETURNING_behavior

..but I'm hopeful that the 4 year old wiki page is out of date.

Justin




both force order of evaluation and hit index

2019-04-10 Thread Justin Pryzby
I wrote a script to import CSV query logs as described here:
https://www.postgresql.org/docs/current/runtime-config-logging.html#RUNTIME-CONFIG-LOGGING-CSVLOG

I also created indices for searching:
|CREATE INDEX ON postgres_log_2019_04_09_2300 ((regexp_replace(message, 
'^duration: ([.0-9]+) ms.*', '\1')::float)) WHERE message~'^duration:'

I'm having an issue that queries on long duration 1) need to specify "WHERE
message~'^duration'" to hit the index; 2) need to *evaluate* that condition
before the next condition:
|regexp_replace(message, '^duration: ([.0-9]+) ms.*', '\1')::float))>

I've tried using CASE and OFFSET 0 and WITH, it seems like any effort I make to
force evaluation of "duration>" to happen before "message~'^duration'" also
causes a seq scan rather than index scan on query duration.

I'm hesitating to use ORDER BY just to encourage index scan, since it doesn't
even guarantee that it's going to avoid the error.

This is doing index scan on other/"wrong" index:
|postgres=# explain SELECT date_trunc('minute',log_time)log_time, 
error_severity, session_id, user_name, database, left(message,99), 
left(query,99)
|FROM postgres_log_2019_04_10_1000 WHERE log_time>'2019-04-10 10:51' AND 
|CASE WHEN message~'^duration:' THEN regexp_replace(message, '^duration: 
([.0-9]+) ms.*', '\1')::float ELSE NULL END >  AND
|(application_name!='psql' OR user_name='postgres') ORDER BY 1 DESC;

| Sort  (cost=74908.17..74955.87 rows=19080 width=101)
|   Sort Key: (date_trunc('minute'::text, log_time)) DESC
|   ->  Bitmap Heap Scan on postgres_log_2019_04_10_1000  (cost=22.24..73551.61 
rows=19080 width=101)
| Recheck Cond: (log_time > '2019-04-10 10:51:00-04'::timestamp with 
time zone)
| Filter: (((application_name <> 'psql'::text) OR (user_name = 
'postgres'::text)) AND (CASE WHEN (message ~ '^duration:'::text) THEN 
(regexp_replace(message, '^duration: ([.0-9]+) ms.*'::text, 
'\1'::text))::double precision ELSE NULL::double precision END > ''::double 
precision))
| ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_log_time_idx  
(cost=0.00..17.47 rows=58197 width=0)
|   Index Cond: (log_time > '2019-04-10 10:51:00-04'::timestamp 
with time zone)

This gives a nice plan, but doesn't avoid errors due to casting to float lines
which don't match the duration regex.

postgres=# explain SELECT date_trunc('minute',log_time)log_time, dur, 
error_severity, session_id, user_name, database, left(message,99), 
left(query,99)
FROM (SELECT *,regexp_replace(message, '^duration: ([.0-9]+) ms.*', 
'\1')::float AS dur FROM postgres_log_2019_04_10_1000 WHERE 
log_time>'2019-04-10 10:50' AND message~'^duration:')x WHERE
dur> AND (application_name!='psql' OR user_name='postgres') ;

| Bitmap Heap Scan on postgres_log_2019_04_10_1000  (cost=5214.86..50141.68 
rows=19742 width=109)
|   Recheck Cond: ((log_time > '2019-04-10 10:50:00-04'::timestamp with time 
zone) AND ((regexp_replace(message, '^duration: ([.0-9]+) ms.*'::text, 
'\1'::text))::double precision > ''::double precision) AND (message ~ 
'^duration:'::text))
|   Filter: ((application_name <> 'psql'::text) OR (user_name = 
'postgres'::text))
|   ->  BitmapAnd  (cost=5214.86..5214.86 rows=19819 width=0)
| ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_log_time_idx  
(cost=0.00..17.50 rows=59462 width=0)
|   Index Cond: (log_time > '2019-04-10 10:50:00-04'::timestamp 
with time zone)
| ->  Bitmap Index Scan on postgres_log_2019_04_10_1000_duration_idx  
(cost=0.00..5187.23 rows=238241 width=0)
|   Index Cond: ((regexp_replace(message, '^duration: ([.0-9]+) 
ms.*'::text, '\1'::text))::double precision > ''::double precision)
|(8 rows)

|ERROR:  invalid input syntax for type double precision: "temporary file: path 
"pg_tblspc/3292386340/PG_11_201809051/pgsql_tmp/pgsql_tmp16493.0", size 
603619328"

If I write as CTE or OFFSET 0 subq, that doesn't allow scanning on "duration"
index, and scans on time index instead:

|postgres=# explain analyze WITH x AS (SELECT *,regexp_replace(message, 
'^duration: ([.0-9]+) ms.*', '\1')::float AS dur FROM 
postgres_log_2019_04_10_1000 WHERE log_time>'2019-04-10 10:50' AND 
message~'^duration:') SELECT date_trunc('minute',log_time)log_time, dur, 
error_severity, session_id, user_name, database, left(message,99), 
left(query,99)
FROM x WHERE dur> AND (application_name!='psql' OR user_name='postgres') ;

|postgres=# explain analyze SELECT date_trunc('minute',log_time)log_time, dur, 
error_severity, session_id, user_name, database, left(message,99), 
left(query,99)
FROM (SELECT *,regexp_replace(message, '^duration: ([.0-9]+) ms.*', 
'\1')::float AS dur FROM postgres_log_2019_04_10_1000 WHERE 
log_time>'2019-04-10 10:50' AND message~'^duration:' OFFSET 0)x WHERE
dur> AND (application_name!='psql' OR user_name='postgres') ;

Thanks in advance for any advice.

Justin




Re: Question about pg_upgrade from 9.2 to X.X

2019-03-07 Thread Justin Pryzby
On Wed, Mar 06, 2019 at 09:44:16PM -0800, Perumal Raj wrote:
> Any pointers for pg_repack schema creation ?

With recent postgres, you should use just: "CREATE EXTENSION pg_repack", which
does all that for you.

> Will there be any impact in the future , Since i used --link option ?

You probably have an old DB directory laying around which is (at least
partially) hardlinks.  You should remove it .. but be careful to remove the
correct dir.  My scripts always rename the old dir before running pg_upgrade,
so it's less scary to rm -fr it later.

Justin



Re: Question about pg_upgrade from 9.2 to X.X

2019-03-05 Thread Justin Pryzby
On Tue, Mar 05, 2019 at 08:09:12AM -0800, Perumal Raj wrote:
> Thanks Sergei/Justin for the continues update.
> 
> So reorg Schema might be created as part of some scripts prior to 9.2
> Version ?

I'm guessing they were probably created in 9.2.

> These are the functions in DB not the Extension. However these functions
> will not run as the associated libraries are not exists in System now (9.2)
> and I hope no impact to system.

I guess someone installed pgreorg, ran its scripts to install its functions
into the DB, and then removed pgreorg without removing its scripts.

> One Question need your address,
> 
> Prior to 9.2 to 9.6 upgrade , I have tried 9.2 10.7 upgrade and failed
> similar error(you can refer beginning o the post ).
> 
> > could not load library "$libdir/hstore": ERROR:  could not access file 
> > "$libdir/hstore": No such file or directory
> > could not load library "$libdir/adminpack": ERROR:  could not access file 
> > "$libdir/adminpack": No such file or directory
> > could not load library "$libdir/uuid-ossp": ERROR:  could not access file 
> > "$libdir/uuid-ossp": No such file or directory
> 
> These Extension seems to be standard. What is the use of these function and
> do we have any alternative in Higher version or Enhanced object if i drop
> it in 9.2 and continue upgrade to 10.7 Version.

See Sergei's response:
https://www.postgresql.org/message-id/7164691551378448%40myt3-1179f584969c.qloud-c.yandex.net

You probably want to install this package for the new version (9.6 or 10 or
11).

[pryzbyj@TS-DB ~]$ rpm -ql postgresql11-contrib |grep -E 
'(uuid-ossp|adminpack|hstore)\.control'
/usr/pgsql-11/share/extension/adminpack.control
/usr/pgsql-11/share/extension/hstore.control
/usr/pgsql-11/share/extension/uuid-ossp.control

Justin



Re: Question about pg_upgrade from 9.2 to X.X

2019-03-04 Thread Justin Pryzby
On Mon, Mar 04, 2019 at 02:21:11PM -0800, Perumal Raj wrote:
> Does it mean that these functions are default and came with 9.2 ?
> I am wondering how these functions are created in the DB as the
> library($libdir/pg_reorg)  is not exists in system

I don't think it's default.
But was probably installed by running some SQL script.

It tentatively sounds safe to me to drop, but you should take a backup and
inspect and double check your pg_dump output and output of "begin; drop schema
pgreorg cascade".

Justin



Re: Question about pg_upgrade from 9.2 to X.X

2019-03-04 Thread Justin Pryzby
On Mon, Mar 04, 2019 at 01:37:30PM -0800, Perumal Raj wrote:
> I could see bunch of functions under reorg schema.

Those functions are the ones preventing you from upgrading.
You should drop schema pg_reorg cascade.
You can run it in a transaction first to see what it will drop.
But after the upgrade, you can CREATE EXTENSION pg_repack, which is a fork of
pg_reorg, which is itself no longer maintained.

Justin



Re: Question about pg_upgrade from 9.2 to X.X

2019-03-03 Thread Justin Pryzby
Moving to -general list (-hackers is for development topics like proposed
patches and patch reviews and beta testing and crash reports).

On Thu, Feb 28, 2019 at 10:13:58AM -0800, Perumal Raj wrote:
> could not load library "$libdir/pg_reorg":
> ERROR:  could not access file "$libdir/pg_reorg": No such file or directory

As Sergei said, you can run pg_dump -s and look for references to reorg, and
drop them.

Or, you could try this:
CREATE EXTENSION pg_reorg FROM unpackaged;

Or maybe this:
CREATE EXTENSION pg_repack FROM unpackaged;

If that works, you can DROP EXTENSION pg_repack;

Otherwise, I think you can maybe do something like:
DROP SCHEMA pg_repack CASCADE; -- or,
DROP SCHEMA pg_reorg CASCADE;

Please send output of: \dn



Re: postgresql v11.1 Segmentation fault: signal 11: by running SELECT... JIT Issue?

2019-02-12 Thread Justin Pryzby
On Mon, Nov 26, 2018 at 07:00:35PM -0800, Andres Freund wrote:
> The fix is easy enough, just adding a
> v_hoff = LLVMBuildZExt(b, v_hoff, LLVMInt32Type(), "");
> fixes the issue for me.

On Tue, Jan 29, 2019 at 12:38:38AM -0800, pabloa98 wrote:
> And perhaps should I modify this too?
> If that is the case, I am not sure what kind of modification we should do.

Andres commited the fix in November, and it's included in postgres11.2, which
is scheduled to be released Thursday.  So we'll both be able to re-enable JIT
on our wide tables again.
https://git.postgresql.org/gitweb/?p=postgresql.git;a=commitdiff;h=b23852766

Justin



Re: query logging of prepared statements

2019-02-09 Thread Justin Pryzby
On Fri, Feb 08, 2019 at 07:29:53AM -0600, Justin Pryzby wrote:
> A couple months ago, I implemented prepared statements for PyGreSQL.  While
> updating our application in advance of their release with that feature, I
> noticed that our query logs were several times larger.

Previously sent to -general (and quoted fully below), resending to -hackers
with patch.
https://www.postgresql.org/message-id/20190208132953.GF29720%40telsasoft.com

I propose that the prepared statement associated with an EXECUTE should only be
included in log "DETAIL" when log_error_verbosity=VERBOSE, for both SQL EXECUTE
and PQexecPrepared.  I'd like to be able to continue logging DETAIL (including
bind parameters), so want like to avoid setting "TERSE" just to avoid redundant
messages.

With attached patch, I'm not sure if !*portal_name && !portal->prepStmtName
would catch cases other than PQexecParams (?)

Compare unpatched server to patched server to patched server with
log_error_verbosity=verbose.

|$ psql postgres -xtc "SET log_error_verbosity=default;SET log_statement='all'; 
SET client_min_messages=log" -c "PREPARE q AS SELECT 2" -c "EXECUTE q"
|SET
|LOG:  statement: PREPARE q AS SELECT 2
|PREPARE
|LOG:  statement: EXECUTE q
|DETAIL:  prepare: PREPARE q AS SELECT 2
|?column? | 2

|$ PGHOST=/tmp PGPORT=5678 psql postgres -xtc "SET 
log_error_verbosity=default;SET log_statement='all'; SET 
client_min_messages=log" -c "PREPARE q AS SELECT 2" -c "EXECUTE q"
|SET
|LOG:  statement: PREPARE q AS SELECT 2
|PREPARE
|LOG:  statement: EXECUTE q
|?column? | 2

|$ PGHOST=/tmp PGPORT=5678 psql postgres -xtc "SET 
log_error_verbosity=verbose;SET log_statement='all'; SET 
client_min_messages=log" -c "PREPARE q AS SELECT 2" -c "EXECUTE q"
|SET
|LOG:  statement: PREPARE q AS SELECT 2
|PREPARE
|LOG:  statement: EXECUTE q
|DETAIL:  prepare: PREPARE q AS SELECT 2
|?column? | 2

For PQexecPrepared library call:

|$ xPGPORT=5678 xPGHOST=/tmp PYTHONPATH=../PyGreSQL/build/lib.linux-x86_64-2.7/ 
python2.7 -c "import pg; d=pg.DB('postgres'); d.query('SET 
client_min_messages=log; SET log_error_verbosity=default; SET 
log_statement=\"all\"'); d.query('SELECT 1; PREPARE q AS SELECT \$1'); 
d.query_prepared('q',[1]); d.query_formatted('SELECT %s', [2])"
|LOG:  statement: SELECT 1; PREPARE q AS SELECT $1
|LOG:  execute q: SELECT 1; PREPARE q AS SELECT $1
|DETAIL:  parameters: $1 = '1'
|LOG:  execute : SELECT $1
|DETAIL:  parameters: $1 = '2'

|$ PGPORT=5678 PGHOST=/tmp PYTHONPATH=../PyGreSQL/build/lib.linux-x86_64-2.7/ 
python2.7 -c "import pg; d=pg.DB('postgres'); d.query('SET 
client_min_messages=log; SET log_error_verbosity=default; SET 
log_statement=\"all\"'); d.query('SELECT 1; PREPARE q AS SELECT \$1'); 
d.query_prepared('q',[1]); d.query_formatted('SELECT %s', [2])"
|LOG:  statement: SELECT 1; PREPARE q AS SELECT $1
|LOG:  execute q
|DETAIL:  parameters: $1 = '1'
|LOG:  execute : SELECT $1
|DETAIL:  parameters: $1 = '2'

|$ PGPORT=5678 PGHOST=/tmp PYTHONPATH=../PyGreSQL/build/lib.linux-x86_64-2.7/ 
python2.7 -c "import pg; d=pg.DB('postgres'); d.query('SET 
client_min_messages=log; SET log_error_verbosity=verbose; SET 
log_statement=\"all\"'); d.query('SELECT 1; PREPARE q AS SELECT \$1'); 
d.query_prepared('q',[1]); d.query_formatted('SELECT %s', [2])"
|LOG:  statement: SELECT 1; PREPARE q AS SELECT $1
|LOG:  execute q: SELECT 1; PREPARE q AS SELECT $1
|DETAIL:  parameters: $1 = '1'
|LOG:  execute : SELECT $1
|DETAIL:  parameters: $1 = '2'

Also, I noticed that if the statement was prepared using SQL PREPARE (rather
than PQprepare), and if it used simple query with multiple commands, they're
all included in the log, like this when executed with PQexecPrepared:
|LOG:  execute q: SET log_error_verbosity=verbose; SET client_min_messages=log; 
PREPARE q AS SELECT $1

And looks like this for SQL EXECUTE:
|[pryzbyj@telsasoft-db postgresql]$ psql postgres -txc "SET 
client_min_messages=log;SELECT 1;PREPARE q AS SELECT 2" -c "EXECUTE q"
|PREPARE
|LOG:  statement: EXECUTE q
|DETAIL:  prepare: SET client_min_messages=log;SELECT 1;PREPARE q AS SELECT 2
|?column? | 2

On Fri, Feb 08, 2019 at 07:29:53AM -0600, Justin Pryzby wrote:
> A couple months ago, I implemented prepared statements for PyGreSQL.  While
> updating our application in advance of their release with that feature, I
> noticed that our query logs were several times larger.
> 
> With non-prepared statements, we logged to CSV:
> |message| SELECT 1
> 
> With SQL EXECUTE, we log:
> |message| statement: EXECUTE sqlex(2);
> |detail | prepare: prepare sqlex AS SELECT $1;
> 
> With PQexecPrepared, we would log:
> |message| execute qq: PREPARE qq AS SELECT $1
> |detail |

Re: Server goes to Recovery Mode when run a SQL

2019-02-09 Thread Justin Pryzby
Hi,

On Fri, Feb 08, 2019 at 02:11:33PM -0700, PegoraroF10 wrote:
> *Well, now we have two queries which stops completelly our postgres server.
> That problem occurs on 10.6 and 11.1 versions.
> On both server the problem is the same. 

> Linux logs of old crash are:*
> Feb  1 18:39:53 fx-cloudserver kernel: [  502.405788] show_signal_msg: 5
> callbacks suppressedFeb  1 18:39:53 fx-cloudserver kernel: [  502.405791]
> postgres[10195]: segfault at 24 ip 555dc6a71cb0 sp 7ffc5f91db38
> error 4 in postgres[555dc69b4000+6db000]

"segfault" seems to mean you hit a bug, which we'll want more information to
diagnose.  Could you install debugging symbols ?  Ubuntu calls their package
postgresql-10-dbg or similar.  And start server with coredumps enabled, using
pg_ctl -c -D /var/lib/postgresql/10/main (or similar).  Then trigger the query
and hope to find a core dump in the data directory.  Or possibly it'll be
processed into /var/crash by apport daemon, depending if that's running and
enabled (check /proc/sys/kernel/core_pattern).

https://wiki.postgresql.org/wiki/Getting_a_stack_trace_of_a_running_PostgreSQL_backend_on_Linux/BSD#Getting_a_trace_from_a_randomly_crashing_backend

> *Linux Log of new crash, which takes several minutes to stop:*
> Feb  8 15:06:40 fxReplicationServer kernel: [1363901.643121] postgres
> invoked oom-killer: gfp_mask=0x24280ca, order=0, oom_score_adj=0Feb  8
> fxReplicationServer kernel: [1363901.643368] Killed process 9399 (postgres)
> total-vm:16518496kB, anon-rss:11997448kB, file-rss:38096kBFeb  8 17:21:16
> fxReplicationServer kernel: [1371977.845728] postgres[10321]: segfault at 10
> ip 5567a6069752 sp 7ffed70be970 error 4 in

In this case, you ran out of RAM, as you noticed.  You should make sure ANALYZE
statistics are up to date (including manually ANALYZEing any parent tables).

On Sun, Feb 03, 2019 at 09:05:42AM -0700, PegoraroF10 wrote:
> I´m using Postgres 10 on ubuntu in a Google VM (8 cores, 32Gb RAM, 250Gb SSD)
> and DB has 70GB

What is work_mem setting ?

You could try running the query with lower work_mem, or check EXPLAIN ANALYZE
and try to resolve any bad rowcount estimates.

> *This query runs for aproximately 5 minutes. See link above with images and
> logs and you´ll see how memory will grow. Memory use starts with 8gb e grows
> until use them all. When all memory is in use then it starts to swap. When
> all swap is allocated then it gets the "out of memory" and stops
> completelly.

> on: https://drive.google.com/open?id=18zIvkV3ew4aZ1_cxI-EmIPVql7ydvEwi*

It says "Access Denied", perhaps you could send a link to
https://explain.depesz.com/ ?

Justin



query logging of prepared statements

2019-02-08 Thread Justin Pryzby
A couple months ago, I implemented prepared statements for PyGreSQL.  While
updating our application in advance of their release with that feature, I
noticed that our query logs were several times larger.

With non-prepared statements, we logged to CSV:
|message| SELECT 1

With SQL EXECUTE, we log:
|message| statement: EXECUTE sqlex(2);
|detail | prepare: prepare sqlex AS SELECT $1;

With PQexecPrepared, we would log:
|message| execute qq: PREPARE qq AS SELECT $1
|detail | parameters: $1 = '3'

For comparison, with PQexecParams, the logs I see look like this (apparently
the "unnamed" prepared statement is used behind the scenes):
|message| execute : SELECT [...]

It's not clear to me why it'd be desirable for the previous PREPARE to be
additionally logged during every execution, instead of just its name (in
"message") and params (in "detail").  (Actually, I had to triple check that it
wasn't somehow executing a prepared statement which itself created a prepared
statement...)

For us, the performance benefit is to minimize the overhead (mostly in pygres)
of many INSERTs into append-only tables.  It's not great that a feature
intended to improve performance is causing 2x more log volume to be written.

Also, it seems odd that for SQL EXECUTE, the PREPARE is shown in "detail", but
for the library call, it's shown in "message".

I found:
|commit bc24d5b97673c365f19be21f83acca3c184cf1a7
|Author: Bruce Momjian 
|Date:   Tue Aug 29 02:11:30 2006 +
|
|Now bind displays prepare as detail, and execute displays prepare and
|optionally bind.  I re-added the "statement:" label so people will
|understand why the line is being printed (it is log_*statement
|behavior).
|
|Use single quotes for bind values, instead of double quotes, and double
|literal single quotes in bind values (and document that).  I also made
|use of the DETAIL line to have much cleaner output.

and:

|commit c8961bf1ce0b51019db31c5572dac18b664e02f1
|Author: Bruce Momjian 
|Date:   Fri Aug 4 18:53:46 2006 +
|
|Improve logging of protocol-level prepared statements.

Justin



pg_stat_activity.query_start in the future?

2018-08-17 Thread Justin Pryzby
I'm buried and not having any good ideas how to diagnose this or what else to
send, so here it is.  Feel free to tell me why I'm the one whose confused..

postgres=# SELECT pid, now(), query_start, state, query FROM pg_stat_activity 
WHERE pid=27757;
  pid  |  now  |  query_start  | state  
|query
---+---+---++-
 27757 | 2018-08-17 11:10:16.568429-04 | 2018-08-17 10:17:52.814282-04 | active 
| autovacuum: VACUUM ANALYZE public.eric_enodeb_cell_20180815
(1 row)

postgres=# \! ps -O lstart 27757
  PID  STARTED S TTY  TIME COMMAND
27757 Fri Aug 17 08:53:20 2018 S ?00:11:56 postgres: autovacuum worker 
process   xton
postgres=# \! date
Fri Aug 17 11:10:58 EDT 2018

Justin



Re: ALTER TABLE SET (toast.asdf) is not rejected ... if a table has no toast

2018-06-15 Thread Justin Pryzby
On Mon, Jun 11, 2018 at 11:47:59AM -0400, Alvaro Herrera wrote:
> On 2018-Jun-11, Justin Pryzby wrote:
> 
> > I noticed that this is accepted:
> > 
> > postgres=# ALTER TABLE t SET (toast.asdf=128);
> > ALTER TABLE
> > 
> > I thought since "toast" was a core namespace, it would've been rejected?
> > 
> > I recall having read a discussion about verifying these ... I wasn't able
> > to find what I was thinking of, but found this one.
> > https://www.postgresql.org/message-id/flat/20090114144332.GF24156%40alvh.no-ip.org

> 
> Uh.  ISTM that this was lost during the 9.4 cycle, because that command *is*
> rejected in 9.3.
> alvherre=# ALTER TABLE t SET (toast.asdf=128);
> ERROR:  unrecognized parameter "asdf"

I'm assuming your "t" has a toast table.  I think this is properly rejected
when toast exists.

It looks like we were maybe both (partially?) fooled by behavior that's been
discussed multiple times over the years: SET (toast.*) is ignored for tables
which don't currently have any TOAST.
https://www.postgresql.org/message-id/flat/20090211194311.GP8924%40alvh.no-ip.org

Maybe all's well, but I suggest there's maybe an question which hasn't been
previously been disambiguated and raised: if a table has no toast, should SET
(toast.JUNK) be ignored, or rejected ?

BTW maybe I was originally thinking of this:
https://www.postgresql.org/message-id/flat/c4d71df2-9e0e-3912-dc81-9a72e080c238%40lab.ntt.co.jp#c4d71df2-9e0e-3912-dc81-9a72e080c...@lab.ntt.co.jp

Justin



ALTER TABLE SET (toast.asdf) is not rejected

2018-06-11 Thread Justin Pryzby
I noticed that this is accepted:

postgres=# ALTER TABLE t SET (toast.asdf=128);
ALTER TABLE

I thought since "toast" was a core namespace, it would've been rejected?

I recall having read a discussion about verifying these ... I wasn't able
to find what I was thinking of, but found this one.
https://www.postgresql.org/message-id/flat/20090114144332.GF24156%40alvh.no-ip.org

Justin



Re: notes from transition to relkind='p'

2018-06-04 Thread Justin Pryzby
On Sun, Jun 03, 2018 at 12:13:49PM -0700, Adrian Klaver wrote:
> On 06/01/2018 03:14 PM, Justin Pryzby wrote:
> >Before I forget any more, this is a brain of issues/considerations/concerns

Should have said brain DUMP

Keep in mind, I've phrased these as notes, maybe useful to someone else, maybe
not an issue to most people, and maybe not possible or worth changing behavior
for, since these are already in last year's published release.

> >with our (partial) transition to partitioned tables over the last ~9 months. 
> > I
> >believe these are all documented behaviors, but could be seen by users as a
> >gratuitious/unexpected change or rough edge and the differences could perhaps
> >be mitigated.  I realize there's maybe no good time or way to change most of
> >these, but maybe the list will be helpful to somebody in avoiding unexpected
> >transitional issues.
> 
> It is not clear from above what you are transitioning from.

Transitioning from inheritence based partitioning with empty relkind='r'
parents to PG10 declarative partitioning.

> >  . ANALYZE relkind_p also updates stats of child (whereas ANALYZE relkind_r
> >only updates stats for parent); it's unclear if there's any reason why it
> >wasn't always done this way(?).  I end up having to conditionize 
> > processing
> >based on relkind. 3c3bb99330aa9b4c2f6258bfa0265d806bf365c3
> 
> Because there is a clear partition dependency in the declarative
> partitioning scheme. In the INHERIT scheme(which I assume is what you are
> talking about) the partitioning is optional.

I see your point - but AFAICT there's no reason why ANALYZE relkind_r couldn't
have always acted "recursively" on any children, or if there's any reason it
couldn't conceivably have been changed in v10, for consistency, if that was
determined to outweigh the disruptiveness/inconvenience of changing an existing
behavior.

> >  . The docs say: if detaching/re-attach a partition, should first ADD CHECK 
> > to
> >avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
> >implicitly CREATE a constraint which is usable when reATTACHing?
> 
> I personally would not like that. If I wanted to maintain the partition
> constraint I would keep the table in the partition. If I was DETACHing it
> then it would be to get rid of it or have it exist as a stand alone table.
> If I where to keep it deciding what constraints to maintain should be up to
> me.

Noted; for us, the reasons we uninherit/detach (and then inherit/attach) are:

 - to avoid massive space use and long, disruptive table lock while executing a
   typo-promoting ALTER.
 - when ALTERing to a new tablespace, to avoid locking parent table, which is
   more disruptive than losing visibility of a partition's tuples;

> >  . relkind_p has no entry in pg_stat_user_tables (last_analyze, etc).  Maybe
> >the view could do the needed CASE ... (SELECT min() FROM pg_inherits 
> > JOIN psut)
> >  . ALTER TABLE ATTACH requires specifying bounds: Maybe it sounds naive to
> >suggest one would want to avoid that; but consider: we ended up adding 
> > both
> >shell and python logic to parse the table name to allow detaching and
> >reattaching partitions.  I think it'd be a nice if the bounds were 
> > inferred
> >if there was a single constraint on the partition key.
> 
> The above I am not sure how you envision that working, especially the
> inferring the key part. Having the program guess at what I want a partition
> to be constrained by is something I would need fully explained to me.

I mean the partition bounds of an partition of a relkind='p' could (perhaps) be
changed into a CHECK constraint at DETACH time, avoiding full table scan and
long lock if it's later re-ATTACHED.  If implemented, it could be requested
rather than strictly implied..

ALTER TABLE p DETACH r ADD CONSTRAINT c;
...
ALTER TABLE p ATTACH r FOR VALUES INFERRED BY CONSTRAINT c;
-- must be a check constraint, must be a "simple" list or range with a single
column, etc.  Maybe not feasible for hash partitioning?

Justin



notes from transition to relkind='p'

2018-06-01 Thread Justin Pryzby
Before I forget any more, this is a brain of issues/considerations/concerns
with our (partial) transition to partitioned tables over the last ~9 months.  I
believe these are all documented behaviors, but could be seen by users as a
gratuitious/unexpected change or rough edge and the differences could perhaps
be mitigated.  I realize there's maybe no good time or way to change most of
these, but maybe the list will be helpful to somebody in avoiding unexpected
transitional issues.

 . DROP TABLE on a parent no longer fails without CASCADE (dependencies of
   relkind='p' are deptype='a' "soft" dependencies)
   8b4d582d279d784616c228be58af1e39aa430402
 . ANALYZE relkind_p also updates stats of child (whereas ANALYZE relkind_r
   only updates stats for parent); it's unclear if there's any reason why it
   wasn't always done this way(?).  I end up having to conditionize processing
   based on relkind. 3c3bb99330aa9b4c2f6258bfa0265d806bf365c3
 . The docs say: if detaching/re-attach a partition, should first ADD CHECK to
   avoid a slow ATTACH operation.  Perhaps DETACHing a partition could
   implicitly CREATE a constraint which is usable when reATTACHing?
 . relkind_p has no entry in pg_stat_user_tables (last_analyze, etc).  Maybe
   the view could do the needed CASE ... (SELECT min() FROM pg_inherits JOIN 
psut)
 . ALTER TABLE ATTACH requires specifying bounds: Maybe it sounds naive to
   suggest one would want to avoid that; but consider: we ended up adding both
   shell and python logic to parse the table name to allow detaching and
   reattaching partitions.  I think it'd be a nice if the bounds were inferred
   if there was a single constraint on the partition key.
 . ALTER TABLE ATTACH has reversed order of child vs parent relative to
   NO/INHERIT.
 . And actually, having both ALTER TABLE DE/TACH vs NO/INHERIT is itself messy:
   we ended up having branches (both shell and python) to handle both cases (at
   least for a transitional period, but probably we'll need to continue
   handling both into the indeterminate future).

Cheers,
Justin



\d t: ERROR: XX000: cache lookup failed for relation

2018-05-26 Thread Justin Pryzby
Is that considered an actionable problem?

Encountered while trying to reproduce the vacuum full pg_statistic/toast_2619
bug; while running a loop around VAC FULL and more in another session:
[1]-  Running { time sh -ec 'while :; do psql --port 5678 
postgres -qc "VACUUM FULL pg_toast.pg_toast_2619"; psql --port 5678 postgres 
-qc "VACUUM FULL pg_statistic"; done'; date; } &
[2]+  Running time while :; do
psql postgres --port 5678 -c "INSERT INTO t SELECT i FROM 
generate_series(1,99) i"; sleep 1; for a in `seq 999`;
do  
psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE int USING 
i::int"; sleep 1; psql postgres --port 5678 -c "ALTER TABLE t ALTER i TYPE 
bigint"; sleep 1;
done; psql postgres --port 5678 -c "TRUNCATE t"; sleep 1;
done &

[pryzbyj@kgfs66telsadb ~]$ psql --port 5678 postgres -x
psql (11beta1)
Type "help" for help.
...
postgres=# \set VERBOSITY verbose 
postgres=# \d t
ERROR:  XX000: cache lookup failed for relation 8096742
LOCATION:  flatten_reloptions, ruleutils.c:11065

Justin



Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

2018-05-25 Thread Justin Pryzby
On Sat, May 19, 2018 at 02:39:26PM -0400, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > [pryzbyj@database ~]$ while :; do for db in `psql postgres -Atc "SELECT 
> > datname FROM pg_database WHERE datallowconn"`; do for t in pg_statistic 
> > pg_attrdef pg_constraint; do echo "$db.$t..."; 
> > PGOPTIONS=-cstatement_timeout='9s' psql $db -qc "VACUUM FULL $t"; done; 
> > done; done
> 
> > ...
> > postgres.pg_statistic...
> > postgres.pg_attrdef...
> > postgres.pg_constraint...
> > template1.pg_statistic...
> > template1.pg_attrdef...
> > template1.pg_constraint...
> > ts.pg_statistic...
> > ERROR:  canceling statement due to statement timeout
> > ts.pg_attrdef...
> > ts.pg_constraint...
> > postgres.pg_statistic...
> > ERROR:  missing chunk number 0 for toast value 3372855171 in pg_toast_2619
> 
> Hm, so was the timeout error happening every time through on that table,
> or just occasionally, or did you provoke it somehow?  I'm wondering how
> your 9s timeout relates to the expected completion time.

Actually statement_timeout isn't essential for this (maybe it helps to triggers
it more often - not sure).

Could you try:
time sh -ec 'while :; do time psql postgres -c "VACUUM FULL VERBOSE 
pg_toast.pg_toast_2619"; psql postgres -c "VACUUM FULL VERBOSE pg_statistic"; 
done'; date

Three servers experienced error within 30min, but one server didn't fail until
12h later, and a handful others still haven't failed..

Does this help at all ?
 2018-05-24 21:57:49.98-03  | 5b075f8d.1ad1 | LOG| pryzbyj   | 
postgres | statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
 2018-05-24 21:57:50.067-03 | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
postgres | vacuuming "pg_toast.pg_toast_2619"
 2018-05-24 21:57:50.09-03  | 5b075f8d.1ad1 | INFO   | pryzbyj   | 
postgres | "pg_toast_2619": found 0 removable, 408 nonremovable row versions in 
99 pages
 2018-05-24 21:57:50.12-03  | 5b075f8e.1ada | LOG| pryzbyj   | 
postgres | statement: VACUUM FULL VERBOSE pg_statistic
 2018-05-24 21:57:50.129-03 | 5b075f8e.1ada | INFO   | pryzbyj   | 
postgres | vacuuming "pg_catalog.pg_statistic"
 2018-05-24 21:57:50.185-03 | 5b075f8e.1ada | ERROR  | pryzbyj   | 
postgres | missing chunk number 0 for toast value 3382957233 in pg_toast_2619

Some thing; this server has autovacuum logging, although it's not clear to me
if that's an essential component of the problem, either:
 2018-05-24 21:16:39.856-06 | LOG   | 5b078017.7b99 | pryzbyj   | postgres | 
statement: VACUUM FULL VERBOSE pg_toast.pg_toast_2619
 2018-05-24 21:16:39.876-06 | LOG   | 5b078010.7968 |   |  | 
automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index scans: 1 
   +
|   |   |   |  | 
pages: 0 removed, 117 r
 2018-05-24 21:16:39.909-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
vacuuming "pg_toast.pg_toast_2619"
 2018-05-24 21:16:39.962-06 | INFO  | 5b078017.7b99 | pryzbyj   | postgres | 
"pg_toast_2619": found 0 removable, 492 nonremovable row versions in 117 pages
 2018-05-24 21:16:40.025-06 | LOG   | 5b078018.7b9b | pryzbyj   | postgres | 
statement: VACUUM FULL VERBOSE pg_statistic
 2018-05-24 21:16:40.064-06 | INFO  | 5b078018.7b9b | pryzbyj   | postgres | 
vacuuming "pg_catalog.pg_statistic"
 2018-05-24 21:16:40.145-06 | ERROR | 5b078018.7b9b | pryzbyj   | postgres | 
missing chunk number 0 for toast value 765874692 in pg_toast_2619

Or this one?

postgres=# SELECT log_time, database, user_name, session_id, left(message,999) 
FROM postgres_log WHERE (log_time>='2018-05-24 19:56' AND log_time<'2018-05-24 
19:58') AND (database='postgres' OR database IS NULL OR user_name IS NULL OR 
user_name='pryzbyj') AND message NOT LIKE 'statement:%' ORDER BY 1;

log_time   | 2018-05-24 19:56:35.396-04
database   | 
user_name  | 
session_id | 5b075131.3ec0
left   | skipping vacuum of "pg_toast_2619" --- lock not available

...

log_time   | 2019-05-24 19:57:35.78-04
database   | 
user_name  | 
session_id | 5b07516d.445e
left   | automatic vacuum of table "postgres.pg_toast.pg_toast_2619": index 
scans: 1
   : pages: 0 removed, 85 remain, 0 skipped due to pins, 0 skipped 
frozen
   : tuples: 1 removed, 348 remain, 0 are dead but not yet removable, 
oldest xmin: 63803106
   : buffer usage: 179 hits, 4 misses, 87 dirtied
   : avg read rate: 1.450 MB/s, avg write rate: 31.531 MB/s
   : system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.02 s

log_time   | 2018-05-24 19:57:35.879-04
database   | postgres
user_name  | pryzbyj
session_id | 5b07516f.447f
left   | missing chunk number 0 for toast value 624341680 in pg_toast_2619

log_time   | 2018-05-24 19:57:44.332-04
database   |
user_name  |
session_id | 5af9fda3.70d5
left   | checkpoint starting: time

Justin



Re: postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

2018-05-19 Thread Justin Pryzby
On Sat, May 19, 2018 at 11:08:23AM -0400, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > I'll defer fixing this for awhile in case someone wants me to save a copy of
> > the relation/toast/index.  From last time, I recall this just needs the 
> > right
> > combination of REINDEX/VACUUM/ANALYZE, and the only complication was me
> > needing to realize the right combination of affected DB(s).
> 
> If you could come up with such a sequence that causes the problem
> reproducibly, that would be of huge interest, and probably lead to
> a fix promptly.  But I don't think that we can do much by looking
> at the post-mortem state --- once the toast rows are gone, they're
> gone, especially if the table's been vacuumed since.

This is unlikely to allow reproducing it, but for sake of completeness here's a
fuller log.  I'll try to trigger on another DB.

postgres=# SELECT log_time, database, session_id, left(message,99) FROM 
postgres_log WHERE log_time BETWEEN '2018-05-19 07:49:01' AND '2018-05-19 
07:50' AND (database IS NULL OR database='postgres') ORDER BY 1 ;
 2018-05-19 07:49:02.232-06 |  | 5afbc238.382f | checkpoint complete: 
wrote 32175 buffers (6.1%); 0 WAL file(s) added, 0 removed, 8 recycled; write=
 2018-05-19 07:49:02.261-06 | postgres | 5b002b4e.65f2 | statement: SHOW 
server_version
 2018-05-19 07:49:02.278-06 | postgres | 5b002b4e.65f7 | statement: SELECT 
pg_get_indexdef('jrn_postgres_log_log_time_idx'::regclass)
 2018-05-19 07:49:02.29-06  | postgres | 5b002b4e.65f9 | statement: SELECT 1 
FROM information_schema.tables WHERE table_name='postgres_log' LIMIT 1
 2018-05-19 07:49:02.311-06 | postgres | 5b002b4e.65fb | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log'
 2018-05-19 07:49:02.324-06 | postgres | 5b002b4e.65fd | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_unique_idx'
 2018-05-19 07:49:02.338-06 | postgres | 5b002b4e.65ff | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_log_time_idx'
 2018-05-19 07:49:02.353-06 | postgres | 5b002b4e.6601 | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_error_severity_idx'
 2018-05-19 07:49:02.37-06  | postgres | 5b002b4e.6603 | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_message_system_idx'
 2018-05-19 07:49:02.39-06  | postgres | 5b002b4e.6605 | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_error_message_idx'
 2018-05-19 07:49:02.405-06 | postgres | 5b002b4e.6607 | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_duration_idx'
 2018-05-19 07:49:02.422-06 | postgres | 5b002b4e.6609 | statement: SELECT 1 
FROM pg_class WHERE relname='jrn_postgres_log_quotedquoted_idx'
 2018-05-19 07:49:02.464-06 | postgres | 5b002b4e.6619 | statement: SELECT 1 
FROM pg_class WHERE relname='postgres_log_2018_05_19_0700'
 2018-05-19 07:49:02.482-06 | postgres | 5b002b4e.661c | statement: COPY 
postgres_log_2018_05_19_0700 FROM 
'/var/log/postgresql/postgresql-2018-05-19_074617
 2018-05-19 07:49:04.711-06 | postgres | 5b002b50.6627 | statement: SELECT 1 
FROM pg_class WHERE relname='postgres_log_2018_05_19_0700'
 2018-05-19 07:49:04.724-06 | postgres | 5b002b50.662a | statement: COPY 
postgres_log_2018_05_19_0700 FROM 
'/var/log/postgresql/postgresql-2018-05-19_074643
 2018-05-19 07:49:06.803-06 | postgres | 5b002b52.6637 | statement: SELECT 
pg_get_indexdef('jrn_postgres_log_duration_idx'::regclass)
 2018-05-19 07:49:06.837-06 | postgres | 5b002b52.6639 | statement: SELECT 
inhrelid::regclass::text FROM pg_inherits i LEFT JOIN pg_constraint c ON 
i.inhrel
 2018-05-19 07:49:06.867-06 | postgres | 5b002b52.663b | statement: SELECT 
inhrelid::regclass::text FROM pg_inherits WHERE 
inhparent='postgres_log'::regclas
 2018-05-19 07:49:06.918-06 | postgres | 5b002b52.6641 | statement: SELECT 
log_time<now()-'25 hours'::interval FROM postgres_log_2018_05_18_0700 LIMIT 1
 2018-05-19 07:49:14.126-06 | postgres | 5b002b5a.66c9 | statement: SELECT 
DISTINCT ON (session_id) log_time, session_id, 
replace(regexp_replace(detail,'^(.
 2018-05-19 07:49:32.264-06 |  | 5afbc238.382f | checkpoint starting: 
time
 2018-05-19 07:49:33.972-06 |  | 5b002b59.66c1 | automatic analyze of 
table "ts.public.cdrs_huawei_sgwrecord_2018_05_19" system usage: CPU: user: 6.
 2018-05-19 07:49:38.192-06 | postgres | 5b002b72.69d5 | statement: SELECT 
starelid::regclass, attname FROM pg_statistic s JOIN pg_attribute a 
 +
|  |   | ON 
a.attrel
 2018-05-19 07:49:38.232-06 | postgres | 5b002b72.69d8 | statement: DELETE FROM 
pg_statistic s USING pg_attribute a WHERE   +
|  |   | 
a.attrelid=s.starelid AND a.attn
 2018-05-19 07:49:38.266-0

postgres=# VACUUM FULL pg_statistic => ERROR: missing chunk number 0 for toast value .. in pg_toast_2619

2018-05-19 Thread Justin Pryzby
I recall seeing various discussions hoping that it'd been finally fixed - Just
wanted to report that this has happened now under postgres 10.4.

It looks like this is not related to: 0408e1ed599b06d9bca2927a50a4be52c9e74bb9
which is for "unexpected chunk number" (?)

Note that this is on the postgres database, which I think is where I saw it on
one of our internal VMs in the past (althought my memory indicates that may
have affected multiple DBs).  In the immediate case, this is customer's centos6
VM running under qemu/KVM: the same configuration as our internal VM which had
this issue (I just found a ticket dated 2016-10-06).

In case it helps:
 - the postgres database has a few things in it, primarily imported CSV logs.
   On this particular server, there's actually a 150GB table with old CSV logs
   from an script I fixed recently to avoid saving many lines than intended
   (something like for each session_id every session_line following an
   error_severity!='LOG')
 - I also have copies of pg_stat_bgwriter, pg_settings, and an aggregated copy
   of pg_buffercache here.
 - nagios: some scripts loop around all DBs; some maybe connect directly to
   postgres (for example, to list DBs).  However, I don't think check_postgres
   probably doesn't connect to postgres DB.

I'll defer fixing this for awhile in case someone wants me to save a copy of
the relation/toast/index.  From last time, I recall this just needs the right
combination of REINDEX/VACUUM/ANALYZE, and the only complication was me
needing to realize the right combination of affected DB(s).

Thanks,
Justin



Re: relkind='p' has no pg_stat_user_tables

2018-05-03 Thread Justin Pryzby
On Thu, May 03, 2018 at 11:15:19AM -0700, Adrian Klaver wrote:
> On 05/03/2018 10:38 AM, Justin Pryzby wrote:
> >On Thu, May 03, 2018 at 09:31:12AM -0700, Adrian Klaver wrote:
> >>On 05/03/2018 09:20 AM, Alvaro Herrera wrote:
> >>>>https://www.postgresql.org/docs/10/static/sql-createtable.html
> >>>>
> >>>>"A partitioned table is divided into sub-tables (called partitions), which
> >>>>are created using separate CREATE TABLE commands. The partitioned table is
> >>>>itself empty. A data row inserted into the table is routed to a partition
> >>>>based on the value of columns or expressions in the partition key. ... "
> >>>
> >>>Yeah, but I think Justin has a valid question from the POV of the user:
> >>>how can we figure out if we need to re-run analyze on a partitioned
> >>>table, if the time of last analyze is not stored anywhere?
> >>
> >>I agree. The only thing I can think of is, that knowing :
> >>
> >>ANALYZE VERBOSE t;
> >>
> >>walks the inheritance tree, look at the pg_stat_user_tables for one of the
> >>children for the last time analyzed.
> >
> >I think I can make this work for my purposes:
> >
> >SELECT MIN(GREATEST(last_analyze,last_autoanalyze))
> >FROM pg_stat_user_tables psut
> >JOIN pg_inherits i
> >ON i.inhrelid=psut.relid
> >WHERE i.inhparent=...
> >
> >I was about to say that it's perhaps more correct for relkind='r' parents, 
> >too.
> >
> >But actually, it looks like for relkind='p', ANALYZE populates stats on child
> >tables in addition to the parent.  For relkind='r', the behavior (introduced 
> >in
> >PG9.0 as I recall) is that ANALYZE on parent creates stats only for parent
> >(both "inherited" stats including children, and "ONLY" stats for the
> >potentially-nonempty parent).
> >
> >I guess ability to update child tables' stats is a nice feature, but I'm
> >surprised.  I wonder if that was a deliberate/documented change ?
> 
> I was with you until I got to the above. You seem to be comparing apples and
> oranges unless I am missing something.

Yes, I was surprised about the difference between ANALYZE relkind_p
and relkind_r.

But I see that's a documented behavior I'd missed until now:

https://www.postgresql.org/docs/current/static/sql-analyze.html
|If the specified table is a partitioned table, both the inheritance statistics
|of the partitioned table as a whole and statistics of the individual partitions
|are updated.

Thanks,
Justin



Re: relkind='p' has no pg_stat_user_tables

2018-05-03 Thread Justin Pryzby
On Thu, May 03, 2018 at 09:31:12AM -0700, Adrian Klaver wrote:
> On 05/03/2018 09:20 AM, Alvaro Herrera wrote:
> >>https://www.postgresql.org/docs/10/static/sql-createtable.html
> >>
> >>"A partitioned table is divided into sub-tables (called partitions), which
> >>are created using separate CREATE TABLE commands. The partitioned table is
> >>itself empty. A data row inserted into the table is routed to a partition
> >>based on the value of columns or expressions in the partition key. ... "
> >
> >Yeah, but I think Justin has a valid question from the POV of the user:
> >how can we figure out if we need to re-run analyze on a partitioned
> >table, if the time of last analyze is not stored anywhere?
> 
> I agree. The only thing I can think of is, that knowing :
> 
> ANALYZE VERBOSE t;
> 
> walks the inheritance tree, look at the pg_stat_user_tables for one of the
> children for the last time analyzed.

I think I can make this work for my purposes:

SELECT MIN(GREATEST(last_analyze,last_autoanalyze))
FROM pg_stat_user_tables psut
JOIN pg_inherits i
ON i.inhrelid=psut.relid
WHERE i.inhparent=...

I was about to say that it's perhaps more correct for relkind='r' parents, too.

But actually, it looks like for relkind='p', ANALYZE populates stats on child
tables in addition to the parent.  For relkind='r', the behavior (introduced in
PG9.0 as I recall) is that ANALYZE on parent creates stats only for parent
(both "inherited" stats including children, and "ONLY" stats for the
potentially-nonempty parent).

I guess ability to update child tables' stats is a nice feature, but I'm
surprised.  I wonder if that was a deliberate/documented change ?

Justin



relkind='p' has no pg_stat_user_tables

2018-05-03 Thread Justin Pryzby
I (finally) realized that my script for ANALYZEing parents of table hierarchies
every month or so was looping around the same parent tables every night due to
no stats for date of last last analysis.

I guess that's deliberate/known and maybe related to relkind='p' having no
relfilenode.

Is there any good workaround other than making stampfiles or making my own
"last analyzed" table?

Thanks,
Justin



Re: circular wait not triggering deadlock ?

2018-03-08 Thread Justin Pryzby
On Thu, Mar 08, 2018 at 03:05:36PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > On Thu, Mar 08, 2018 at 01:57:06PM -0500, Tom Lane wrote:
> >> PID 20488 is evidently waiting for PID 6471 to finish its transaction.
> >> What's that one doing?
> 
> > Um, I thought I had kept track of all two pids but looks not..
> > query| SELECT pg_export_snapshot();
> 
> pg_export_snapshot doesn't block for other transactions, though.
> Further down, you have output that confirms that:
> 
> >   6471 | idle in transaction | psql  | SELECT pg_export_snapshot();
> 
> That SELECT is the most recently completed command, not the current one.
> So now the question is what the connected application is waiting for
> before committing.

I believe it does an idle loop around sleep(1), until all the pg_dump
--snapshot children are done.

Justin



Re: circular wait not triggering deadlock ?

2018-03-08 Thread Justin Pryzby
On Thu, Mar 08, 2018 at 01:57:06PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > Running pg10.2, I have a handful of maintenance jobs run in the middle of 
> > the
> > night, which appear to have gotten stuck waiting on each other..
> 
> > ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM 
> > pg_locks WHERE virtualxid='22/4317099' ORDER BY 1,2,3;
> >  granted |  pid  | mode  | virtualtransaction | virtualxid 
> > -+---+---++
> >  f   | 20488 | ShareLock | 19/28401734| 22/4317099
> >  t   |  6471 | ExclusiveLock | 22/4317099 | 22/4317099
> 
> PID 20488 is evidently waiting for PID 6471 to finish its transaction.
> What's that one doing?

Um, I thought I had kept track of all two pids but looks not..

query| SELECT pg_export_snapshot();

So that explains that.
I already killed one proc, but..

ts=# SELECT granted, relation::regclass, pid, mode, virtualtransaction, 
virtualxid FROM pg_locks WHERE pid=17248 ORDER BY 1;
 granted |relation |  pid  |  mode   | 
virtualtransaction | virtualxid
-+-+---+-++
 f   | eric_enodeb_cell_metrics| 17248 | AccessShareLock | 
27/1755026 |

ts=# SELECT granted, relation::regclass, pid, mode, virtualtransaction, 
virtualxid FROM pg_locks WHERE relation='eric_enodeb_cell_metrics'::regclass 
ORDER BY 1; --pid=17248  virtualxid='22/4317099' ORDER BY 1,2,3;
 granted | relation |  pid  |   mode   | 
virtualtransaction | virtualxid
-+--+---+--++
 f   | eric_enodeb_cell_metrics | 22961 | AccessShareLock  | 
31/1337307 |
 f   | eric_enodeb_cell_metrics | 17248 | AccessShareLock  | 
27/1755026 |
 f   | eric_enodeb_cell_metrics | 28357 | AccessShareLock  | 
28/1118276 |
 f   | eric_enodeb_cell_metrics | 21846 | AccessExclusiveLock  | 
29/830016  |
 f   | eric_enodeb_cell_metrics | 16592 | AccessExclusiveLock  | 
18/38156962|
 f   | eric_enodeb_cell_metrics | 18941 | AccessShareLock  | 
32/838769  |
 t   | eric_enodeb_cell_metrics |   530 | ShareUpdateExclusiveLock | 
16/38159763|
(7 rows)

ts=# SELECT pid, state, left(query,199) FROM pg_stat_activity WHERE pid 
IN(530,17248);
   530 | active | ALTER TABLE eric_enodeb_cell_metrics* ALTER start_time SET 
STATISTICS 400
 17248 | active | SELECT c.tableoid, c.oid, c.relname, (SELECT 
pg_catalog.array_agg(acl ORDER BY row_n) FROM (SELECT acl, row_n FROM 
pg_catalog.unnest(coalesce(c.relacl,pg_catalog.acldefault(
CASE WHEN c.relkind = 'S' 

ts=# SELECT granted, pid, mode, virtualtransaction, virtualxid FROM pg_locks 
WHERE relation='eric_enodeb_cell_20180304'::regclass ;
 granted |  pid  |   mode   | virtualtransaction | virtualxid
-+---+--++
 t   | 20488 | ShareUpdateExclusiveLock | 19/28401734|
 f   |   530 | ShareUpdateExclusiveLock | 16/38159763|

ts=# SELECT pid, state, left(query,199) FROM pg_stat_activity WHERE pid=20488;
  pid  | state  |   
left
---++---
 20488 | active | CREATE INDEX CONCURRENTLY index_1113966210 ON 
eric_enodeb_cell_20180304 USING btree (site_id) WITH (fillfactor='100') 
TABLESPACE oldindex

ts=# SELECT granted, pid, relation::regclass, locktype, mode  virtualxid, 
virtualxid, virtualtransaction FROM pg_locks WHERE pid=20488 ORDER BY 1;
 granted |  pid  | relation  |  locktype  |virtualxid   
 | virtualxid  | virtualtransaction 
-+---+---++--+-+
 f   | 20488 |   | virtualxid | ShareLock   
 | 22/4317099  | 19/28401734

ts=# SELECT granted, pid, relation::regclass, locktype, mode  virtualxid, 
virtualtransaction FROM pg_locks WHERE virtualxid='22/4317099' ORDER BY 1;
 granted |  pid  | relation |  locktype  |  virtualxid   | virtualtransaction 
-+---+--++---+
 f   | 20488 |  | virtualxid | ShareLock | 19/28401734
 t   |  6471 |  | virtualxid | ExclusiveLock | 22/4317099
(2 rows)

So...I gather ALTER SET STATS is waiting on pg_dump which is waiti

Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Wed, Nov 22, 2017 at 07:43:50PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > For starters, I found that PID 27427 has:
> 
> > (gdb) p proc->lwWaiting
> > $1 = 0 '\000'
> > (gdb) p proc->lwWaitMode
> > $2 = 1 '\001'
> 
> To confirm, this is LWLockAcquire's "proc", equal to MyProc?
> If so, and if LWLockAcquire is blocked at PGSemaphoreLock,
> that sure seems like a smoking gun.

Right:

(gdb) bt
#0  0x003f6ee0d930 in ?? ()
#1  0x7f19f3400d38 in ?? ()
#2  0x006a4440 in PGSemaphoreLock (sema=0x7f19f3400d38) at pg_sema.c:310
#3  0x00714495 in LWLockAcquire (lock=0x7f19f3404f80, mode=LW_SHARED) 
at lwlock.c:1233
#4  0x006f75e5 in BufferAlloc (smgr=0x1f3c368, relpersistence=112 'p', 
forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, 
hit=0x7fff6ef1c5af "") at bufmgr.c:1012
#5  ReadBuffer_common (smgr=0x1f3c368, relpersistence=112 'p', 
forkNum=MAIN_FORKNUM, blockNum=1, mode=RBM_NORMAL, strategy=0x0, 
hit=0x7fff6ef1c5af "") at bufmgr.c:745
[...]

(gdb) up
#1  0x7f19f3400d38 in ?? ()
(gdb) up
#2  0x006a4440 in PGSemaphoreLock (sema=0x7f19f3400d38) at pg_sema.c:310
310 errStatus = sem_wait(PG_SEM_REF(sema));
(gdb) up
#3  0x00714495 in LWLockAcquire (lock=0x7f19f3404f80, mode=LW_SHARED) 
at lwlock.c:1233
1233PGSemaphoreLock(proc->sem);
(gdb) p proc->lwWaiting
$1 = 0 '\000'
(gdb) p proc->lwWaitMode
$2 = 1 '\001'
(gdb) p proc
$3 = (PGPROC *) 0x7f1a77dba500
(gdb) p MyProc
$4 = (PGPROC *) 0x7f1a77dba500
(gdb) p MyProc==proc
$5 = 1

I suppose one needs to show that this backend was stuck and not "just
happened to start 1ms before gdb connected", which I actually have:
postgres 27427 17665  0 16:57 ?00:00:00 postgres: indigo indigo 
207.241.169.248(37226) startup
...
postgres 30298 17665  0 17:01 ?00:00:00 postgres: main main 
207.241.169.100(43088) startup

..so that process was in "startup" 3-4 minutes after being forked.

> My hypothesis about a missed memory barrier would imply that there's (at
> least) one process that's waiting but is not in the lock's wait queue and

Do I have to also check the wait queue to verify?  Give a hint/pointer please?

> > Would you suggest how I can maximize the likelyhood/speed of triggering 
> > that ?
> > Five years ago, with a report of similar symptoms, you said "You need to 
> > hack
> > pgbench to suppress the single initialization connection it normally likes 
> > to
> > make, else the test degenerates to the one-incoming-connection case"
> > https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us
> 
> I don't think that case was related at all.
> 
> My theory suggests that any contended use of an LWLock is at risk,
> in which case just running pgbench with about as many sessions as
> you have in the live server ought to be able to trigger it.  However,
> that doesn't really account for your having observed the problem
> only during session startup,

Remember, this issue breaks existing sessions, too.

After it'd happened a couple times, I made sure to leave a session opened to
allow collecting diagnostics (and try to un-stick it), but when it recurred,
was unable to even SELECT * FROM pg_locks.

Am I right this won't help for lwlocks? ALTER SYSTEM SET log_lock_waits=yes

Justin



Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Wed, Nov 22, 2017 at 01:27:12PM -0500, Tom Lane wrote:
> Justin Pryzby <pry...@telsasoft.com> writes:
> > On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> >> Could you try stracing next time?
> 
> > I straced all the "startup" PIDs, which were all in futex, without 
> > exception:
> 
> If you've got debug symbols installed, could you investigate the states
> of the LWLocks the processes are stuck on?
> 
> My hypothesis about a missed memory barrier would imply that there's (at
> least) one process that's waiting but is not in the lock's wait queue and
> has MyProc->lwWaiting == false, while the rest are in the wait queue and
> have MyProc->lwWaiting == true.  Actually chasing through the list
> pointers would be slightly tedious, but checking MyProc->lwWaiting,
> and maybe MyProc->lwWaitMode, in each process shouldn't be too hard.

> Also verify that they're all waiting for the same LWLock (by address).

I believe my ~40 cores are actually for backends from two separate instances of
this issue on the VM, as evidenced by different argv pointers.

And for each instance, I have cores for only a fraction of the backends
(max_connections=400).

For starters, I found that PID 27427 has:

(gdb) p proc->lwWaiting
$1 = 0 '\000'
(gdb) p proc->lwWaitMode
$2 = 1 '\001'

..where all the others have lwWaiting=1

For #27427:
(gdb) p *lock
$27 = {tranche = 59, state = {value = 1627389952}, waiters = {head = 147, tail 
= 308}}

(gdb) info locals
mustwait = 1 '\001'
proc = 0x7f1a77dba500
result = 1 '\001'
extraWaits = 0
__func__ = "LWLockAcquire"

And at this point I have to ask for help how to finish traversing these
structures.  I could upload cores for someone (I don't think there's anything
too private) but so far I have 16GB gz compressed cores.

Note: I've compiled locally PG 10.1 with PREFERRED_SEMAPHORES=SYSV to keep the
service up (and to the degree that serves to verify that avoids the issue,
great).

But I could start an instance running pgbench to try to trigger on this VM,
with smaller shared_buffers and backends/clients to allow full cores of every
backend (I don't think I'll be able to dump all 400 cores each up to 2GB from
the production instance).

Would you suggest how I can maximize the likelyhood/speed of triggering that ?
Five years ago, with a report of similar symptoms, you said "You need to hack
pgbench to suppress the single initialization connection it normally likes to
make, else the test degenerates to the one-incoming-connection case"
https://www.postgresql.org/message-id/8896.1337998337%40sss.pgh.pa.us

..but, pgbench --connect seems to do what's needed(?)  (I see that dates back
to 2001, having been added at ba708ea3).

(I don't know there's any suggestion or reason to be believe the bug is
specific to connection/startup phase, or that it's a necessary or sufficient to
hit the bug, but it's at least known to be impacted and all I have to go on for
now).

Justin



Re: backends stuck in "startup"

2017-11-22 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> > I'm sorry to report this previously reported problem is happening again,
> > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
> > 
> > As $subject: backends are stuck in startup for minutes at a time.  I didn't
> > strace this time, but I believe last time I saw one was waiting in a futex.
> 
> >  - I'm planning do disable sync seqscan for the night (although as I 
> > understand
> >that may not matter (?))
> 
> I doubt it matters here.

The system still got wedged/stuck after a few hours even with:

synchronize_seqscans=off; and use of pg_buffercache disabled (but extension not
actually dropped), and our maintenance (vacuum and reindex) scripts disabled
(these normally only run at night, so at least REINDEX script wasn't involved
with several earlier incidents of this issue; but, I was still running the
vacuum ANALYZE script post-upgrade).

Justin



Re: backends stuck in "startup"

2017-11-21 Thread Justin Pryzby
On Tue, Nov 21, 2017 at 03:40:27PM -0800, Andres Freund wrote:
> Hi,
> 
> On 2017-11-21 17:09:26 -0600, Justin Pryzby wrote:
> > I'm sorry to report this previously reported problem is happening again,
> > starting shortly after pg_upgrading a customer to PG10.1 from 9.6.5.
> > 
> > As $subject: backends are stuck in startup for minutes at a time.  I didn't
> > strace this time, but I believe last time I saw one was waiting in a futex.
> 
> Could you try stracing next time?

I straced all the "startup" PIDs, which were all in futex, without exception:

[pryzbyj@TS-DB ~]$ sudo strace `ps h --sort start_time -o pid,args -u postgres 
|awk '/startup$/{print "-p"$1}'` 2>&1 |tee -a /tmp/strace-pg-backends

Process 3611 attached
Process 6081 attached
[pid  6081] futex(0x7ff4314000f8, FUTEX_WAIT, 0, NULL 
[pid  3611] futex(0x7ff431400138, FUTEX_WAIT, 0, NULL 
[pid  3354] futex(0x7ff431400258, FUTEX_WAIT, 0, NULL 
[pid  3117] futex(0x7ff4314002b8, FUTEX_WAIT, 0, NULL 
[pid  2948] futex(0x7ff431400378, FUTEX_WAIT, 0, NULL 
[pid  2925] futex(0x7ff4314003d8, FUTEX_WAIT, 0, NULL 
[pid  2789] futex(0x7ff431400438, FUTEX_WAIT, 0, NULL 
[pid  2752] futex(0x7ff4314004b8, FUTEX_WAIT, 0, NULL 
[pid  2663] futex(0x7ff431400618, FUTEX_WAIT, 0, NULL 
[pid  2661] futex(0x7ff431400638, FUTEX_WAIT, 0, NULL 
[pid  2624] futex(0x7ff431400678, FUTEX_WAIT, 0, NULL 
[pid  2576] futex(0x7ff4314006f8, FUTEX_WAIT, 0, NULL 
[pid   365] futex(0x7ff431400718, FUTEX_WAIT, 0, NULL 
[pid   326] futex(0x7ff431400798, FUTEX_WAIT, 0, NULL 
[pid 32704] futex(0x7ff431400898, FUTEX_WAIT, 0, NULL 
[pid 32309] futex(0x7ff431400938, FUTEX_WAIT, 0, NULL 
[pid 32292] futex(0x7ff431400a18, FUTEX_WAIT, 0, NULL 
[pid 32170] futex(0x7ff431400ad8, FUTEX_WAIT, 0, NULL 
[pid 32134] futex(0x7ff431400b58, FUTEX_WAIT, 0, NULL 
[pid 31987] futex(0x7ff431400bf8, FUTEX_WAIT, 0, NULL 
[pid 31962] futex(0x7ff431400c58, FUTEX_WAIT, 0, NULL 
[pid 31817] futex(0x7ff431400d38, FUTEX_WAIT, 0, NULL 
[pid 31807] futex(0x7ff431400d98, FUTEX_WAIT, 0, NULL 
[pid 29664] futex(0x7ff431400e98, FUTEX_WAIT, 0, NULL 
[pid 29462] futex(0x7ff431400ef8, FUTEX_WAIT, 0, NULL 
[pid 29426] futex(0x7ff431400f38, FUTEX_WAIT, 0, NULL 
[pid 29258] futex(0x7ff431401058, FUTEX_WAIT, 0, NULL 
[pid 29227] futex(0x7ff431402f58, FUTEX_WAIT, 0, NULL 
[pid 29107] futex(0x7ff431401138, FUTEX_WAIT, 0, NULL 
[pid 29071] futex(0x7ff431401198, FUTEX_WAIT, 0, NULL 
[pid 28942] futex(0x7ff431401258, FUTEX_WAIT, 0, NULL 
[pid 28924] futex(0x7ff431401298, FUTEX_WAIT, 0, NULL 
[pid 26942] futex(0x7ff431401418, FUTEX_WAIT, 0, NULL 
[pid 25278] futex(0x7ff431401438, FUTEX_WAIT, 0, NULL 
[pid 24909] futex(0x7ff431401598, FUTEX_WAIT, 0, NULL 
[pid 24908] futex(0x7ff4314015b8, FUTEX_WAIT, 0, NULL 
[pid 24852] futex(0x7ff4314015f8, FUTEX_WAIT, 0, NULL 
[pid 24851] futex(0x7ff431401618, FUTEX_WAIT, 0, NULL 
[pid 24585] futex(0x7ff4314016f8, FUTEX_WAIT, 0, NULL 
[pid 24584] futex(0x7ff431401718, FUTEX_WAIT, 0, NULL 
[pid 24478] futex(0x7ff431401918, FUTEX_WAIT, 0, NULL 
[pid 24476] futex(0x7ff431401958, FUTEX_WAIT, 0, NULL 
[pid 24470] futex(0x7ff4314019d8, FUTEX_WAIT, 0, NULL 
[pid 24464] futex(0x7ff431401ad8, FUTEX_WAIT, 0, NULL 
[pid 24458] futex(0x7ff431401bb8, FUTEX_WAIT, 0, NULL 
[pid 24456] futex(0x7ff431401b98, FUTEX_WAIT, 0, NULL 
[pid 24450] futex(0x7ff431401c58, FUTEX_WAIT, 0, NULL 
[pid 24448] futex(0x7ff431401c98, FUTEX_WAIT, 0, NULL 
[pid 24439] futex(0x7ff431401cf8, FUTEX_WAIT, 0, NULL 
[pid 24323] futex(0x7ff431401e38, FUTEX_WAIT, 0, NULL 
[pid 24322] futex(0x7ff431401e58, FUTEX_WAIT, 0, NULL 
[pid 23057] futex(0x7ff431401fd8, FUTEX_WAIT, 0, NULL 
[pid 23055] futex(0x7ff431402038, FUTEX_WAIT, 0, NULL 
[pid 21826] futex(0x7ff4314021f8, FUTEX_WAIT, 0, NULL 
[pid 21824] futex(0x7ff431402238, FUTEX_WAIT, 0, NULL 
[pid 21794] futex(0x7ff431402298, FUTEX_WAIT, 0, NULL 
[pid 21792] futex(0x7ff431402638, FUTEX_WAIT, 0, NULL 
[pid 21791] futex(0x7ff431402af8, FUTEX_WAIT, 0, NULL 
[pid 21790] futex(0x7ff431402c98, FUTEX_WAIT, 0, NULL 
[pid 21786] futex(0x7ff431402478, FUTEX_WAIT, 0, NULL 
[pid 21785] futex(0x7ff431402758, FUTEX_WAIT, 0, NULL 
[pid 21784] futex(0x7ff431403098, FUTEX_WAIT, 0, NULL 
[pid 21768] futex(0x7ff431402718, FUTEX_WAIT, 0, NULL 
[pid 21662] futex(0x7ff431402338, FUTEX_WAIT, 0, NULL 
[pid 21648] futex(0x7ff431402c38, FUTEX_WAIT, 0, NULL 
[pid 21630] futex(0x7ff4314024d8, FUTEX_WAIT, 0, NULL 
[pid 21591] futex(0x7ff431402278, FUTEX_WAIT, 0, NULL 
[pid 21590] futex(0x7ff431402bf8, FUTEX_WAIT, 0, NULL

Also:

[pryzbyj@TS-DB ~]$ date;ps -O lstart,wchan=99 -u postgres 
--sort start_time |grep startup$
Wed Nov 22 00:47:41 EST 2017
 1990 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 
postgres: xx xx xx.247(53934) startup
 2007 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?00:00:00 
postgres: xx xx xx.247(53945) startup
 2014 Wed Nov 22 00:36:44 2017 futex_wait_queue_meS ?