Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi,

On 2020-03-24 14:26:06 +0900, Michael Paquier wrote:
> > Could you share what the config of the server was?
> 
> Nothing really fancy:
> - autovacuum_vacuum_cost_delay to 2ms (default of v12, but we used it
> in v11 as well).
> - autovacuum_naptime = 15s
> - autovacuum_max_workers = 6
> - log_autovacuum_min_duration = 0

Oh, so you're also involved in this? I'm starting to get a bit confused
as to who is reporting what.


> > Did you see any errors / fatals around the time autovacuum stopped
> > working?
> 
> Before going rogue (we are not sure if autovacuum didn't launch any
> workers or if the workers were spawned and exited early as we did not
> capture any worker information in pg_stat_activity), we saw a bunch of
> aggressive wraparound jobs.  Even after that, we have traces in the
> logs of one autovacuum analyze happening at equal interval of time (17
> minutes) on one single table, which is...  Er...  uncommon to say the
> least.

Well, there's no logging of autovacuum launchers that don't do anything
due to the "skipping redundant" logic, with normal log level. If somehow
the horizon logic of autovacuum workers gets out of whack with what
vacuumlazy.c does, then you'd not get any vacuums. But a usage level
triggered analyze could still happen on such a table, I think.

While looking at this issue I found a few problems, btw. That seems more
like a -hackers discussion, so I started:
https://postgr.es/m/20200323235036.6pje6usrjjx22zv3%40alap3.anarazel.de

I think I might just have figured out another one...

Greetings,

Andres Freund




Re: PG12 autovac issues

2020-03-23 Thread Michael Paquier
On Mon, Mar 23, 2020 at 01:00:51PM -0700, Andres Freund wrote:
> On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote:
>>> - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest 
>>> database
>>>   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE 
>>> relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;
>> 
>> The vm has been trashed since, and I don't have that level of detail 
>> available
>> in the gathered stats unfortunately (and the available information I have is 
>> a
>> little bit messy, sorry for that).
>> 
>>> - Oldest backend xmin
>>>   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE 
>>> backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
>>> - oldest replication xmin:
>>>   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication 
>>> WHERE backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
>>> - oldest slot xmin:
>>>   SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) FROM 
>>> pg_replication_slots WHERE xmin <> 0 OR catalog_xmin <> 0 ORDER BY 
>>> greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3;
>>> - oldest prepared transaction (unfortunately xmin not available)
>>>   SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY 
>>> age(transaction) LIMIT 3;
>> 
>> I have output from this query: 
>> https://github.com/OPMDG/check_pgactivity/blob/oldest_xmin/check_pgactivity#L5669-L5732
>> (which may be dumb).  Here are the non-null rows after 8AM GMT+1:

This stuff does not use 2PC and logical slots (there is one physical
slot for a WAL archiver), but we got a dump of pg_stat_activity.
Grepping more info about the evolution of pg_database and pg_class is
on our list.

> Could you share what the config of the server was?

Nothing really fancy:
- autovacuum_vacuum_cost_delay to 2ms (default of v12, but we used it
in v11 as well).
- autovacuum_naptime = 15s
- autovacuum_max_workers = 6
- log_autovacuum_min_duration = 0

>>> In that case you'd likely see DEBUG1 output, right? Did you try running
>>> with that?
>> 
>> That's unfortunately not an option, as while the issue is reproducible, it
>> happens after roughly 16h with an average of 6.5ktps, and we definitely don't
>> have enough disk space for that amount of logs.  We'll try next time with a
>> recompiled version with the DEBUG1 promoted to WARNING, and maybe other elog.
> 
> You wouldn't need to run it with DEBUG1 the whole time, you could just
> change the config and reload config once you hit the issue.

Sure, though it would reduce the window when the cluster is still up
and running.  One option that we have is just to increase the elevel
from DEBUG1 to WARNING for the log added in 2aa6e33 to keep the amount
of logs reduced without losing information, and we have no issue to
do runs with custom patches.

>> Not for now, because it's hard to be around for the ~ 90min interval the 
>> system
>> stays alive after atuvacuum get stuck, especially since it's a weekend bench.

Yeah, our room is very limited here.  I would likely not ne able to do
that, or I would just get very lucky with my timezone.  Not sure for
Julien.

>>> It sounds like what might be happening is that you have something
>>> holding back the "oldest needed transaction" horizon. Before
>>> 2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
>>> vacuums unable to increase the horizon, but afterwards they'll all
>>> immediately exit without any messages.
>> 
>> The xmin horizon is increasing AFAICS.  Grepping the autovacuum logs, I can 
>> see
>> for the 08:00 AM UTC - 08:59 AM UTC interval the oldest xmin going from
>> 200069684 to 210363848.  E.g.:
> 
> Which database is this on?

In this case this was the database used by the application, FWIW.  But
I can see from those logs that it kept increasing for all the other
databases, like postgres or template1 when this set of aggressive jobs
happened.

>> 266621:2020-03-14 08:49:59.945 UTC LOG:  automatic aggressive vacuum to 
>> prevent wraparound of table "pg_toast.pg_toast_27712": index scans: 0
>> 266622-  pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
>> frozen
>> 266623-  tuples: 0 removed, 0 remain, 0 are dead but not yet removable, 
>> oldest xmin: 209635953
>> [...]
> 
> Do you have any non-toast ones?

Yep, this includes catalogs and normal tables, based on the logs all
the relations triggered aggressive and anti-wraparound jobs.

>>> I wonder if what might be happening is that we're somehow missed/failed
>>> to update relfrozenxid and/or datfrozenxid. If you manually vacuum some
>>> table in the oldest database, but that is *NOT* the oldest table itself,
>>> does the problem "resolve" itself?
>> 
>> I'll also add pg_class snapshot for next time we run the bench.  I'm not sure
>> if we'll be able to catch the 1h45 interval when the system stays alive
>> after the issue though.
> 
> Could you just script something to stop the benchmark once the disk is
> 

Re: Loading 500m json files to database

2020-03-23 Thread Reid Thompson
On Mon, 2020-03-23 at 03:24 -0700, pinker wrote:
> [EXTERNAL SOURCE]
> 
> 
> 
> Hi, do you have maybe idea how to make loading process faster?
> 
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
> 
> What I came up with now is:
> 
> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
> 
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
> 
> 
> Any ideas?


https://www.gnu.org/software/parallel/ 





Partitioned table migration strategy

2020-03-23 Thread Ronnie S
Hello all,

Are there any migration strategies/best practices when migrating from PG11
partitioned tables to PG12 partitioned tables that don't involve extended
downtime? We have about 900 partitions (hash partitions) in PG11 with
millions of rows that we need to migrate to PG12 and (ideally) would like
to minimize the outage window (if at all possible)?

Any guidance or pointers from your experiences would be highly appreciated!

Best,
Shayon


Re: Loading 500m json files to database

2020-03-23 Thread pinker
hmm now I'm thinking maybe setting up pgbouncer in front of postgres with
statement mode would help?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread pinker
it's in a blob storage in Azure. I'm testing with 1m that I have locally



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent


> On Mar 23, 2020, at 7:11 PM, David G. Johnston  
> wrote:
> 
> On Mon, Mar 23, 2020 at 3:24 AM pinker  > wrote:
> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> Don't know whether this is faster but it does avoid spinning up a connection 
> multiple times.
> 
> #bash, linux
> function append_each_split_file_to_etl_load_script() {
> for filetoload in ./*; do
> ronumber="$(basename $filetoload)"
> # only process files since subdirs can be present
> if [[ -f "$filetoload" ]]; then
> echo ""
> echo "\set invoice"' `cat '"'""$filetoload""'"'`'
> echo ", ('$ronumber',:'invoice')"
> fi >> "$PSQLSCRIPT"
> done
> 
> echo ""  >> "$PSQLSCRIPT"
> echo ";" >> "$PSQLSCRIPT"
> echo ""  >> "$PSQLSCRIPT"
> }
> 
> There is a bit other related code that is needed (for my specific usage) but 
> this is the core of it.  Use psql variables to capture the contents of each 
> file into a variable and then just perform a normal insert (specifically, a 
> VALUES (...), (...) variant).  Since you can intermix psql and SQL you 
> basically output a bloody long script, that has memory issues at scale - but 
> you can divide and conquer - and then "psql --file 
> bloody_long_script_part_1_of_10.psql".
> 
> David J.
> 
Can one put 550M files in a single directory?  I thought it topped out at 16M 
or so.





Re: Loading 500m json files to database

2020-03-23 Thread David G. Johnston
On Mon, Mar 23, 2020 at 3:24 AM pinker  wrote:

> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
>

Don't know whether this is faster but it does avoid spinning up a
connection multiple times.

#bash, linux
function append_each_split_file_to_etl_load_script() {
for filetoload in ./*; do
ronumber="$(basename $filetoload)"
# only process files since subdirs can be present
if [[ -f "$filetoload" ]]; then
echo ""
echo "\set invoice"' `cat '"'""$filetoload""'"'`'
echo ", ('$ronumber',:'invoice')"
fi >> "$PSQLSCRIPT"
done

echo ""  >> "$PSQLSCRIPT"
echo ";" >> "$PSQLSCRIPT"
echo ""  >> "$PSQLSCRIPT"
}

There is a bit other related code that is needed (for my specific usage)
but this is the core of it.  Use psql variables to capture the contents of
each file into a variable and then just perform a normal insert
(specifically, a VALUES (...), (...) variant).  Since you can intermix psql
and SQL you basically output a bloody long script, that has memory issues
at scale - but you can divide and conquer - and then "psql --file
bloody_long_script_part_1_of_10.psql".

David J.


Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver

On 3/23/20 5:23 PM, pinker wrote:

it's a cloud and no plpythonu extension avaiable unfortunately


I presume Python itself is available, so would it not be possible to 
create a program that concatenates the files into batches and COPY(s) 
that data into Postgres using the Psycopg2 COPY functions:


https://www.psycopg.org/docs/usage.html#using-copy-to-and-copy-from





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver

On 3/23/20 5:26 PM, pinker wrote:

Hi,
json_parts it's just single table with 2 column:


Well I misread that.



   Table "public.json_parts"
   Column   |  Type   | Collation | Nullable |Default
| Storage  | Stats target | Description
---+-+---+--++--+--+-
  id| integer |   | not null |
nextval('json_parts_id_seq'::regclass) | plain|  |
  json_data | jsonb   |   |  |

no indexes, constraints or anything else



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: Loading 500m json files to database

2020-03-23 Thread pinker
Hi,
json_parts it's just single table with 2 column:

  Table "public.json_parts"
  Column   |  Type   | Collation | Nullable |Default

| Storage  | Stats target | Description 
---+-+---+--++--+--+-
 id| integer |   | not null |
nextval('json_parts_id_seq'::regclass) | plain|  | 
 json_data | jsonb   |   |  |  

no indexes, constraints or anything else



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread pinker
there is no indexes nor foreign keys, or any other constraints



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread pinker
it's a cloud and no plpythonu extension avaiable unfortunately



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread pinker
Ertan Küçükoğlu wrote
> However, if possible, you may think of using a local physical computer to
> do all uploading and after do backup/restore on cloud system.
> 
> Compressed backup will be far less internet traffic compared to direct
> data inserts.

I was thinking about that but data source is a blob storage, so downloading
it first and then loading locally it's couple days extra for processing :/
it's not that fast even when I'm doing it locally ... so that would be like
extra 2 steps overhead :/



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: Loading 500m json files to database

2020-03-23 Thread pinker
Christopher Browne-3 wrote
> Well, you're paying for a lot of overhead in that, as you're
> establishing a psql command, connecting to a database, spawning a backend
> process, starting a transactions, committing a transaction, closing the
> backend
> process, disconnecting from the database, and cleaning up after the
> launching
> of the psql command.  And you're doing that 500 million times.
> 
> The one thing I left off that was the loading of a single tuple into
> json_parts.
> 
> What you could do to improve things quite a lot would be to group some
> number
> of those files together, so that each time you pay for the overhead, you
> at
> least
> get the benefit of loading several entries into json_parts.
> 
> So, loosely, I'd commend using /bin/cat (or similar) to assemble several
> files together
> into one, and then \copy that one file in.
> 
> Having 2 tuples loaded at once drops overhead by 50%
> Having 10 tuples loaded at once drops overhead by 90%
> Having 100 tuples loaded at once drops overhead by 99%
> Having 1000 tuples loaded at once drops overhead by 99.9%
> 
> There probably isn't too much real value to going past 1000 tuples per
> batch; the
> overhead, by that point, is getting pretty immaterial.
> 
> Reducing that overhead is the single most important thing you can do.

Yes, I was thinking about that but no idea now how to do it right now. like
some kind of outer loop to concatenate those files? and adding delimiter
between them?


Christopher Browne-3 wrote
> It is also quite likely that you could run such streams in parallel,
> although
> it would require quite a bit more information about the I/O capabilities
> of
> your
> hardware to know if that would do any good.

I can spin up every size of instance.





--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html




Re: PG12 autovac issues

2020-03-23 Thread Justin King
On Mon, Mar 23, 2020 at 4:31 PM Justin King  wrote:
>
> On Mon, Mar 23, 2020 at 3:00 PM Andres Freund  wrote:
> >
> > Hi,
> >
> > On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote:
> > > > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest 
> > > > database
> > > >   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class 
> > > > WHERE relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;
> > >
> > > The vm has been trashed since, and I don't have that level of detail 
> > > available
> > > in the gathered stats unfortunately (and the available information I have 
> > > is a
> > > little bit messy, sorry for that).
> > >
> > > > - Oldest backend xmin
> > > >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity 
> > > > WHERE backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > > > - oldest replication xmin:
> > > >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication 
> > > > WHERE backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > > > - oldest slot xmin:
> > > >   SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) 
> > > > FROM pg_replication_slots WHERE xmin <> 0 OR catalog_xmin <> 0 ORDER BY 
> > > > greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3;
> > > > - oldest prepared transaction (unfortunately xmin not available)
> > > >   SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY 
> > > > age(transaction) LIMIT 3;
> > >
> > > I have output from this query: 
> > > https://github.com/OPMDG/check_pgactivity/blob/oldest_xmin/check_pgactivity#L5669-L5732
> > > (which may be dumb).  Here are the non-null rows after 8AM GMT+1:
> >
> > Could you share what the config of the server was?
> >
> >
> > > > > The same bench was run against pg11 many times and never triggered 
> > > > > this issue.
> > > > > So far our best guess is a side effect of 2aa6e331ead7.
> > > >
> > > > In that case you'd likely see DEBUG1 output, right? Did you try running
> > > > with that?
> > >
> > > That's unfortunately not an option, as while the issue is reproducible, it
> > > happens after roughly 16h with an average of 6.5ktps, and we definitely 
> > > don't
> > > have enough disk space for that amount of logs.  We'll try next time with 
> > > a
> > > recompiled version with the DEBUG1 promoted to WARNING, and maybe other 
> > > elog.
> >
> > You wouldn't need to run it with DEBUG1 the whole time, you could just
> > change the config and reload config once you hit the issue.
> >
> >
> > > > > Michael and I have been trying to reproduce this issue locally 
> > > > > (drastically
> > > > > reducing the various freeze_age parameters) for hours, but no luck 
> > > > > for now.
> > > >
> > > > Have you considered using gdb to investigate?
> > >
> > > Not for now, because it's hard to be around for the ~ 90min interval the 
> > > system
> > > stays alive after atuvacuum get stuck, especially since it's a weekend 
> > > bench.
> > >
> > > > > This is using a vanilla pg 12.1, with some OLTP workload.  The only 
> > > > > possibly
> > > > > relevant configuration changes are quite aggressive autovacuum 
> > > > > settings on some
> > > > > tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum 
> > > > > scale
> > > > > factor to 0, for both heap and toast).
> > > >
> > > > That, uh, is not an insignificant set of changes for an autovac
> > > > scheduling issues.
> > > >
> > > > It sounds like what might be happening is that you have something
> > > > holding back the "oldest needed transaction" horizon. Before
> > > > 2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
> > > > vacuums unable to increase the horizon, but afterwards they'll all
> > > > immediately exit without any messages.
> > >
> > > The xmin horizon is increasing AFAICS.  Grepping the autovacuum logs, I 
> > > can see
> > > for the 08:00 AM UTC - 08:59 AM UTC interval the oldest xmin going from
> > > 200069684 to 210363848.  E.g.:
> >
> > Which database is this on?
> >
> >
> > > [...]
> > > 266603:2020-03-14 08:49:59.927 UTC LOG:  automatic aggressive vacuum to 
> > > prevent wraparound of table "pg_toast.pg_toast_27683": index scans: 0
> > > 266604-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 
> > > skipped frozen
> > > 266605-   tuples: 0 removed, 0 remain, 0 are dead but not yet 
> > > removable, oldest xmin: 209635921
> > > 266606-   buffer usage: 25 hits, 1 misses, 1 dirtied
> > > 266607-   avg read rate: 15.440 MB/s, avg write rate: 15.440 MB/s
> > > 266608-   system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 
> > > 0.00 s
> > > 266609:2020-03-14 08:49:59.929 UTC LOG:  automatic aggressive vacuum to 
> > > prevent wraparound of table "pg_toast.pg_toast_27694": index scans: 0
> > > 266610-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 
> > > skipped frozen
> > > 266611-   tuples: 0 removed, 0 remain, 0 are dead but not yet 
> > > removable, oldest xmin: 

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi,

On 2020-03-23 16:31:21 -0500, Justin King wrote:
> This is occurring in our environment right now (started about 30 min
> ago).  Here 's the latest logs (grepped by vacuum):
> 
> Mar 23 20:54:16 cowtn postgres[15569]: [12-1] 2020-03-23 20:54:16.542
> GMT [15569] LOG:  automatic vacuum of table "feedi.production.tita":
> index scans: 1
> Mar 23 20:54:27 cowtn postgres[15654]: [8-1] 2020-03-23 20:54:27.964
> GMT [15654] LOG:  automatic vacuum of table
> "feedi.production.distributed_virtual_schedule": index scans: 1

Hm, unfortunately you've cut off the details in the subsequent
lines. There's a few newlines in the output. Any chance you could
re-post with those included?


> > > > I wonder if what might be happening is that we're somehow missed/failed
> > > > to update relfrozenxid and/or datfrozenxid. If you manually vacuum some
> > > > table in the oldest database, but that is *NOT* the oldest table itself,
> > > > does the problem "resolve" itself?
> 
> postgres=# SELECT datname
> , age(datfrozenxid)
> , current_setting('autovacuum_freeze_max_age')
> FROM pg_database;
>   datname  |age| current_setting
> ---+---+-
>  postgres  | 202375735 | 2
>  template1 | 202345459 | 2
>  template0 | 132459914 | 2
>  feedi | 132459914 | 2
> (4 rows)

Can you show the oldest tables in 'feedi'? Or, hm, actually, could you
just post the result of all the queries from the "What is:" section in
https://postgr.es/m/20200323162303.s7ay5hjdvimtkz6u%40alap3.anarazel.de


> Since this is occurring right now, what else would be useful to
> capture?  You'd asked about a GDB -- do you want that of the main
> process or the autovac worker?

Unless you can give me gdb access directly, I don't yet have enough data
to suggest what exactly we would want to analyze with gdb in your case.


It'd be helpful if you could change log_min_messages to DEBUG1 and
reload the configuration (not restart!).

Greetings,

Andres Freund




Re: PG12 autovac issues

2020-03-23 Thread Justin King
On Mon, Mar 23, 2020 at 3:00 PM Andres Freund  wrote:
>
> Hi,
>
> On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote:
> > > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest 
> > > database
> > >   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class 
> > > WHERE relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;
> >
> > The vm has been trashed since, and I don't have that level of detail 
> > available
> > in the gathered stats unfortunately (and the available information I have 
> > is a
> > little bit messy, sorry for that).
> >
> > > - Oldest backend xmin
> > >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE 
> > > backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > > - oldest replication xmin:
> > >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication 
> > > WHERE backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > > - oldest slot xmin:
> > >   SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) FROM 
> > > pg_replication_slots WHERE xmin <> 0 OR catalog_xmin <> 0 ORDER BY 
> > > greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3;
> > > - oldest prepared transaction (unfortunately xmin not available)
> > >   SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY 
> > > age(transaction) LIMIT 3;
> >
> > I have output from this query: 
> > https://github.com/OPMDG/check_pgactivity/blob/oldest_xmin/check_pgactivity#L5669-L5732
> > (which may be dumb).  Here are the non-null rows after 8AM GMT+1:
>
> Could you share what the config of the server was?
>
>
> > > > The same bench was run against pg11 many times and never triggered this 
> > > > issue.
> > > > So far our best guess is a side effect of 2aa6e331ead7.
> > >
> > > In that case you'd likely see DEBUG1 output, right? Did you try running
> > > with that?
> >
> > That's unfortunately not an option, as while the issue is reproducible, it
> > happens after roughly 16h with an average of 6.5ktps, and we definitely 
> > don't
> > have enough disk space for that amount of logs.  We'll try next time with a
> > recompiled version with the DEBUG1 promoted to WARNING, and maybe other 
> > elog.
>
> You wouldn't need to run it with DEBUG1 the whole time, you could just
> change the config and reload config once you hit the issue.
>
>
> > > > Michael and I have been trying to reproduce this issue locally 
> > > > (drastically
> > > > reducing the various freeze_age parameters) for hours, but no luck for 
> > > > now.
> > >
> > > Have you considered using gdb to investigate?
> >
> > Not for now, because it's hard to be around for the ~ 90min interval the 
> > system
> > stays alive after atuvacuum get stuck, especially since it's a weekend 
> > bench.
> >
> > > > This is using a vanilla pg 12.1, with some OLTP workload.  The only 
> > > > possibly
> > > > relevant configuration changes are quite aggressive autovacuum settings 
> > > > on some
> > > > tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum 
> > > > scale
> > > > factor to 0, for both heap and toast).
> > >
> > > That, uh, is not an insignificant set of changes for an autovac
> > > scheduling issues.
> > >
> > > It sounds like what might be happening is that you have something
> > > holding back the "oldest needed transaction" horizon. Before
> > > 2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
> > > vacuums unable to increase the horizon, but afterwards they'll all
> > > immediately exit without any messages.
> >
> > The xmin horizon is increasing AFAICS.  Grepping the autovacuum logs, I can 
> > see
> > for the 08:00 AM UTC - 08:59 AM UTC interval the oldest xmin going from
> > 200069684 to 210363848.  E.g.:
>
> Which database is this on?
>
>
> > [...]
> > 266603:2020-03-14 08:49:59.927 UTC LOG:  automatic aggressive vacuum to 
> > prevent wraparound of table "pg_toast.pg_toast_27683": index scans: 0
> > 266604-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
> > frozen
> > 266605-   tuples: 0 removed, 0 remain, 0 are dead but not yet 
> > removable, oldest xmin: 209635921
> > 266606-   buffer usage: 25 hits, 1 misses, 1 dirtied
> > 266607-   avg read rate: 15.440 MB/s, avg write rate: 15.440 MB/s
> > 266608-   system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 
> > 0.00 s
> > 266609:2020-03-14 08:49:59.929 UTC LOG:  automatic aggressive vacuum to 
> > prevent wraparound of table "pg_toast.pg_toast_27694": index scans: 0
> > 266610-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
> > frozen
> > 266611-   tuples: 0 removed, 0 remain, 0 are dead but not yet 
> > removable, oldest xmin: 209635934
> > 266612-   buffer usage: 25 hits, 1 misses, 1 dirtied
> > 266613-   avg read rate: 21.058 MB/s, avg write rate: 21.058 MB/s
> > 266614-   system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 
> > 0.00 s
> > 266615:2020-03-14 08:49:59.931 UTC LOG:  

Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi,

On 2020-03-23 20:47:25 +0100, Julien Rouhaud wrote:
> > - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest 
> > database
> >   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE 
> > relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;
> 
> The vm has been trashed since, and I don't have that level of detail available
> in the gathered stats unfortunately (and the available information I have is a
> little bit messy, sorry for that).
> 
> > - Oldest backend xmin
> >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE 
> > backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > - oldest replication xmin:
> >   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication 
> > WHERE backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
> > - oldest slot xmin:
> >   SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) FROM 
> > pg_replication_slots WHERE xmin <> 0 OR catalog_xmin <> 0 ORDER BY 
> > greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3;
> > - oldest prepared transaction (unfortunately xmin not available)
> >   SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY 
> > age(transaction) LIMIT 3;
> 
> I have output from this query: 
> https://github.com/OPMDG/check_pgactivity/blob/oldest_xmin/check_pgactivity#L5669-L5732
> (which may be dumb).  Here are the non-null rows after 8AM GMT+1:

Could you share what the config of the server was?


> > > The same bench was run against pg11 many times and never triggered this 
> > > issue.
> > > So far our best guess is a side effect of 2aa6e331ead7.
> > 
> > In that case you'd likely see DEBUG1 output, right? Did you try running
> > with that?
> 
> That's unfortunately not an option, as while the issue is reproducible, it
> happens after roughly 16h with an average of 6.5ktps, and we definitely don't
> have enough disk space for that amount of logs.  We'll try next time with a
> recompiled version with the DEBUG1 promoted to WARNING, and maybe other elog.

You wouldn't need to run it with DEBUG1 the whole time, you could just
change the config and reload config once you hit the issue.


> > > Michael and I have been trying to reproduce this issue locally 
> > > (drastically
> > > reducing the various freeze_age parameters) for hours, but no luck for 
> > > now.
> > 
> > Have you considered using gdb to investigate?
> 
> Not for now, because it's hard to be around for the ~ 90min interval the 
> system
> stays alive after atuvacuum get stuck, especially since it's a weekend bench.
> 
> > > This is using a vanilla pg 12.1, with some OLTP workload.  The only 
> > > possibly
> > > relevant configuration changes are quite aggressive autovacuum settings 
> > > on some
> > > tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum scale
> > > factor to 0, for both heap and toast).
> > 
> > That, uh, is not an insignificant set of changes for an autovac
> > scheduling issues.
> > 
> > It sounds like what might be happening is that you have something
> > holding back the "oldest needed transaction" horizon. Before
> > 2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
> > vacuums unable to increase the horizon, but afterwards they'll all
> > immediately exit without any messages.
> 
> The xmin horizon is increasing AFAICS.  Grepping the autovacuum logs, I can 
> see
> for the 08:00 AM UTC - 08:59 AM UTC interval the oldest xmin going from
> 200069684 to 210363848.  E.g.:

Which database is this on?


> [...]
> 266603:2020-03-14 08:49:59.927 UTC LOG:  automatic aggressive vacuum to 
> prevent wraparound of table "pg_toast.pg_toast_27683": index scans: 0
> 266604-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
> frozen
> 266605-   tuples: 0 removed, 0 remain, 0 are dead but not yet removable, 
> oldest xmin: 209635921
> 266606-   buffer usage: 25 hits, 1 misses, 1 dirtied
> 266607-   avg read rate: 15.440 MB/s, avg write rate: 15.440 MB/s
> 266608-   system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 266609:2020-03-14 08:49:59.929 UTC LOG:  automatic aggressive vacuum to 
> prevent wraparound of table "pg_toast.pg_toast_27694": index scans: 0
> 266610-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
> frozen
> 266611-   tuples: 0 removed, 0 remain, 0 are dead but not yet removable, 
> oldest xmin: 209635934
> 266612-   buffer usage: 25 hits, 1 misses, 1 dirtied
> 266613-   avg read rate: 21.058 MB/s, avg write rate: 21.058 MB/s
> 266614-   system usage: CPU: user: 0.00 s, system: 0.00 s, elapsed: 0.00 s
> 266615:2020-03-14 08:49:59.931 UTC LOG:  automatic aggressive vacuum to 
> prevent wraparound of table "pg_toast.pg_toast_24845": index scans: 0
> 266616-   pages: 0 removed, 0 remain, 0 skipped due to pins, 0 skipped 
> frozen
> 266617-   tuples: 0 removed, 0 remain, 0 are dead but not yet removable, 
> oldest xmin: 

Re: PG12 autovac issues

2020-03-23 Thread Julien Rouhaud
Hi,

On Mon, Mar 23, 2020 at 09:23:03AM -0700, Andres Freund wrote:
> Hi,
> 
> On 2020-03-23 16:22:47 +0100, Julien Rouhaud wrote:
> > On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote:
> > > Hi,
> > > 
> > > On 2020-03-20 12:42:31 -0500, Justin King wrote:
> > > > When we get into this state again, is there some other information
> > > > (other than what is in pg_stat_statement or pg_stat_activity) that
> > > > would be useful for folks here to help understand what is going on?
> > > 
> > > If it's actually stuck on a single table, and that table is not large,
> > > it would be useful to get a backtrace with gdb.
> > 
> > FTR, we're facing a very similar issue at work (adding Michael and Kevin in 
> > Cc)
> > during performance tests since a recent upgrade to pg12 .
> >
> > What seems to be happening is that after reaching 200M transaction a first 
> > pass
> > of autovacuum freeze is being run, bumping pg_database.darfrozenxid by ~ 
> > 800k
> > (age(datfrozenxid) still being more than autovacuum_freeze_max_age
> > afterwards).
> 
> If you have older transactions around that'd not be surprising. Do you
> have autovacuum logging output for this case?

There's an hourly long running query that can retain xmin up to a few million
xid, but definitely not something close to 200M.

When I said a first pass, it a batch of vacuum.  Here's the overview of "to
prevent wraparound" per-hour logs (UTC timezone):

 442 2020-03-14 07
1686 2020-03-14 08
  14 2020-03-14 10

and an overview of all autovacuum activity (still in UTC):

1366 2020-03-14 00
1457 2020-03-14 01
1387 2020-03-14 02
1440 2020-03-14 03
1349 2020-03-14 04
7383 2020-03-14 05
13909 2020-03-14 06
14240 2020-03-14 07
2094 2020-03-14 08
   0 2020-03-14 09
  16 2020-03-14 10
   3 2020-03-14 11
   4 2020-03-14 12
   3 2020-03-14 13

The final outage being:
2020-03-14 10:27:23.280 UTC [...] ERROR: could not extend file 
"base/16386/20245.4": No space left on device

536222:2020-03-14 10:40:00.089 UTC [...] PANIC:  could not write to file 
"pg_logical/replorigin_checkpoint.tmp": No space left on device

the last autovacuum evidence before that being:

274177:2020-03-14 08:54:11.797 UTC 5e6c8ed0.d665 0   LOG:  automatic vacuum of 
table "pg_toast.pg_toast_20237": index scans: 1
274178- pages: 0 removed, 273666 remain, 0 skipped due to pins, 251545 skipped 
frozen
274179- tuples: 83585 removed, 749 remain, 209 are dead but not yet removable, 
oldest xmin: 210363848
274180- buffer usage: 50096 hits, 23521 misses, 19996 dirtied
274181- avg read rate: 99.195 MB/s, avg write rate: 84.329 MB/s
274182- system usage: CPU: user: 0.28 s, system: 0.21 s, elapsed: 1.85 s

After the crash-and-restart autovacuum is working again (as seen in the 10AM -
1PM logs), although the bench stays stopped.


What I can see in pg_database is (GMT+1 here):

  timestamp| datfrozenxid | age(datfrozenxid)
 [...]
 2020-03-14 09:12:11.279515+01 |  480 | 202554804
 2020-03-14 09:27:12.723617+01 |  480 | 205408276
 2020-03-14 09:42:13.868701+01 |  480 | 208239967
 2020-03-14 09:57:15.685827+01 |   827585 | 210143294
 2020-03-14 10:12:17.488993+01 |   827585 | 213143797
 2020-03-14 10:27:18.899525+01 |   827585 | 216104516
 2020-03-14 10:42:19.926601+01 |   827585 | 219075040
 2020-03-14 10:57:21.023513+01 |   827585 | 222085423
 2020-03-14 11:12:22.85198+01  |   827585 | 225057731


> > After that point, all available information seems to indicate that no
> > autovacuum worker is scheduled anymore:
> 
> Do you mean that this table doesn't get autovac'ed at all anymore, that
> no table is getting autovac'd, or just that there's nothing further
> increasing relfrozenxid for that table?
> 
> It sounds like:
> 
> > - log_autovacuum_min_duration is set to 0 and no activity is logged (while
> >   having thousands of those per hour before that)
> 
> no table at all?

Correct, no table being autovacuumed, no sign of autovacuum being scheduled or
anything.

> > - 15 min interval snapshot of pg_database and pg_class shows that
> >   datfrozenxid/relfrozenxid keeps increasing at a consistent rate and never
> >   goes down
> 
> I assume you mean their age?

Yes sorry.

> What is:
> - datfrozenxid, age(datfrozenxid) for the oldest database
>   SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database ORDER BY 
> age(datfrozenxid) DESC LIMIT 1;

see above.  FTR there's only one database being used.

> - relfrozenxid, age(relfrozenxid) for the oldest table in the oldest database
>   SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE 
> relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;

The vm has been trashed since, and I don't have that level of detail available
in the gathered stats unfortunately (and the available information I have is a
little bit messy, sorry for that).

> - Oldest backend xmin
>   SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE 
> backend_xmin 

Re: Passwordcheck configuration

2020-03-23 Thread Stephen Frost
Greetings,

* Dave Hughes (dhughe...@gmail.com) wrote:
> Thank you for the information!  This issue originated from a Department of
> Defense STIG (Security Technical Implementation Guides).  It's a security
> check that applications and databases have to go through.  I'll just leave
> this one as a "finding" since there isn't a way to really configure it to
> their requirements.

I believe the PG STIG encourages using PKI instead of using passwords
for authentication.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Passwordcheck configuration

2020-03-23 Thread Stephen Frost
Greetings,

* Tom Lane (t...@sss.pgh.pa.us) wrote:
> Dave Hughes  writes:
> > I have a requirement to set some password complexity for our database such
> > as length of password, upper case, lower case, special characters,
> > expiration limit, reuse, etc.
> 
> Usually, if you have to do something like that, we recommend setting PG to
> use PAM authentication and configuring the restrictions on the PAM side.

Trying to do this with PAM really isn't reasonable in my experience, and
I've been there and have gone through the foolishness required to make
it work (in a couple different ways).  PG doesn't run as root though and
many PAM modules expect that, so you end up having to deal with
something like pam_sasl and saslauthd, which is a pain.  I really don't
think it's reasonable for us to be recommending this, and it doesn't
work if what you actually want is for PG to be storing the passwords
(without something like pam_pgsql, but that requires things like the
password being passed to PG in cleartext on every authentication, and is
awkward to set up...).

Basically, you can hack things up enough to get the password complexity
requirement, kind of (note that you can't use \password with psql, for
example, or any other client-side password change mechanism through PG,
so you have to force password changes to be done during authentication,
which is bizarre- users can't change their own password whenever they
wish but rather only when PAM forces them to), but you then end up
running a system that has passwords in the clear being sent to PG by
the client, which is certainly no good.

> The only native capability in that direction is that you can set a
> password expiration date.

Right, which is unfortunate- we should have a better solution.

> Note that it's widely believed that this sort of thing makes you LESS
> secure, not more.  Quite aside from the well-established fact that forced
> password changes are bad from a human-factors standpoint, you can't check
> any of those other points unless the password is sent to the server as
> cleartext.  That creates its own set of vulnerabilities, and I don't
> know of anybody who considers it good practice.

Having the password sent to the server during a PW change isn't great,
certainly, but it's far worse to have the password sent in the clear to
the server on every authentication, which is what using something like
PAM ends up doing.

> > I saw there was a module you can use for this called passwordcheck.  Seems
> > easy to install, but I don't see how you can configure it for you specific
> > needs?
> 
> passwordcheck hasn't got any out-of-the-box configurability.  It's mainly
> meant as sample code that people could modify if they have a mind to.

I believe some people actually do use it, as it can be compiled with
cracklib, which can be configured.

> (I seem to recall some recent discussion about deprecating/removing
> passwordcheck altogether, but I can't find it right now.)

+1 for my part to dropping it, and replacing it with a proper built-in
capability here, that would work with our SCRAM-based auth.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: How does pg_basebackup manage to create a snapshot of the filesystem?

2020-03-23 Thread Stephen Frost
Greetings,

* Dennis Jacobfeuerborn (denni...@conversis.de) wrote:
> I'm currently trying to understand how backups work. In the
> documentation in section "25.2. File System Level Backup" it says that
> filesystem level backups can only be made when the database if offline
> yet pg_basebackup seems to do just that but works while the database is
> online. Am I misunderstanding something here or does pg_basebackup use
> some particular features of Postgres to accomplish this?

A simple tar-based backup as discussed in 25.2 can only be done with the
database off-line, but if you're wondering what pg_basebackup is doing
under the covers then you should be reading through 25.3, which talks
about the PostgreSQL WAL, how a backup can be taken in an online manner
provided you retain the WAL that was created during the backup, how to
restore a backup taken in that manner, etc.

All that said, I don't recommend rolling your own backup solution for
PG.  There's a number of good ones out there, starting with
pg_basebackup if you really just need something simple.

Thanks,

Stephen


signature.asc
Description: PGP signature


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-23 Thread pabloa98
On Mon, Mar 23, 2020 at 9:58 AM Daniel Verite 
wrote:

> pabloa98 wrote:
>
> > When I have a medium number of sequence I will report how it behaves. It
> > will take some time though.
>
> Be aware that creating the sequences on the fly has the kind of race
> condition that you wanted to avoid in the first place.
>
> For instance consider this execution in two concurrent sessions:
>
> S1: BEGIN;
>
> S1: CREATE SEQUENCE seq1 IF NOT EXISTS;
>
> S2:  BEGIN;
>
> S2:  CREATE SEQUENCE seq1 IF NOT EXISTS;
> S2:  (now blocked waiting for S1)
>
> S1: COMMIT;
>
> S2:ERROR:  duplicate key value violates unique constraint
> "pg_type_typname_nsp_index"
> DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists.
>
> The client could catch these errors and retry, but then it could also
> do that with serializable transactions on serialization failures
> (SQLSTATE 40001), and you'd get the guarantee of consecutive
> numbering without creating all these sequences, not to mention
> the protection against other potential concurrency anomalies.
> See https://www.postgresql.org/docs/current/transaction-iso.html
>
>
> Best regards,
> --
> Daniel Vérité
> PostgreSQL-powered mailer: http://www.manitou-mail.org
> Twitter: @DanielVerite
>

Indeed. We have a "creation state" in the application. During that moment
the pair(group,element) and the sequence will be created.
After the creation process is done, the entry is in "production state" and
the sequence will be used only then.

Pablo


Re: Could postgres12 support millions of sequences? (like 10 million)

2020-03-23 Thread Daniel Verite
pabloa98 wrote:

> When I have a medium number of sequence I will report how it behaves. It
> will take some time though.

Be aware that creating the sequences on the fly has the kind of race
condition that you wanted to avoid in the first place.

For instance consider this execution in two concurrent sessions:

S1: BEGIN;

S1: CREATE SEQUENCE seq1 IF NOT EXISTS;

S2:  BEGIN;

S2:  CREATE SEQUENCE seq1 IF NOT EXISTS;
S2:  (now blocked waiting for S1)

S1: COMMIT;

S2:ERROR:  duplicate key value violates unique constraint
"pg_type_typname_nsp_index"
DETAIL : Key (typname, typnamespace)=(seq1, 36434) already exists.

The client could catch these errors and retry, but then it could also
do that with serializable transactions on serialization failures
(SQLSTATE 40001), and you'd get the guarantee of consecutive
numbering without creating all these sequences, not to mention
the protection against other potential concurrency anomalies.
See https://www.postgresql.org/docs/current/transaction-iso.html


Best regards,
-- 
Daniel Vérité
PostgreSQL-powered mailer: http://www.manitou-mail.org
Twitter: @DanielVerite




Re: Runtime partition pruning

2020-03-23 Thread Radu Radutiu
Thanks. Yes, the query with the same parameter seems to work as expected.
It doesn't help us though as we are trying to transparently support
partitioning using an ORM and we cannot change the parameters. Using the
column name would have been much easier.

Regards,
Radu

On Mon, Mar 23, 2020 at 5:56 PM Michael Lewis  wrote:

> select * from test where id between client_id-10 and client_id+10  and
>> client_id=?;
>>
>> does not (it scans all partitions in parallel) .
>> Is it expected?
>>
>
> Yes. But the below would work fine I expect since the planner would know a
> constant range for id. I would be very surprised if the optimizer had some
> condition rewrite rules to handle just the scenario you show.
>
>  select * from test where id between ?-10 and ?+10  and client_id=?;
>


Re: Is it safe to rename an index through pg_class update?

2020-03-23 Thread Kouber Saparev
Just in case somebody else also needs such a functionality in PostgreSQL <
12, I made a function in plpgsql:

https://github.com/kouber/pg_utils/blob/master/rename_index.sql

--
Kouber Saparev


Re: PG12 autovac issues

2020-03-23 Thread Andres Freund
Hi,

On 2020-03-23 16:22:47 +0100, Julien Rouhaud wrote:
> On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote:
> > Hi,
> > 
> > On 2020-03-20 12:42:31 -0500, Justin King wrote:
> > > When we get into this state again, is there some other information
> > > (other than what is in pg_stat_statement or pg_stat_activity) that
> > > would be useful for folks here to help understand what is going on?
> > 
> > If it's actually stuck on a single table, and that table is not large,
> > it would be useful to get a backtrace with gdb.
> 
> FTR, we're facing a very similar issue at work (adding Michael and Kevin in 
> Cc)
> during performance tests since a recent upgrade to pg12 .
>
> What seems to be happening is that after reaching 200M transaction a first 
> pass
> of autovacuum freeze is being run, bumping pg_database.darfrozenxid by ~ 800k
> (age(datfrozenxid) still being more than autovacuum_freeze_max_age
> afterwards).

If you have older transactions around that'd not be surprising. Do you
have autovacuum logging output for this case?


> After that point, all available information seems to indicate that no
> autovacuum worker is scheduled anymore:

Do you mean that this table doesn't get autovac'ed at all anymore, that
no table is getting autovac'd, or just that there's nothing further
increasing relfrozenxid for that table?

It sounds like:

> - log_autovacuum_min_duration is set to 0 and no activity is logged (while
>   having thousands of those per hour before that)

no table at all?


> - 15 min interval snapshot of pg_database and pg_class shows that
>   datfrozenxid/relfrozenxid keeps increasing at a consistent rate and never
>   goes down

I assume you mean their age?

What is:
- datfrozenxid, age(datfrozenxid) for the oldest database
  SELECT datname, age(datfrozenxid), datfrozenxid FROM pg_database ORDER BY 
age(datfrozenxid) DESC LIMIT 1;
- relfrozenxid, age(relfrozenxid) for the oldest table in the oldest database
  SELECT oid::regclass, age(relfrozenxid), relfrozenxid FROM pg_class WHERE 
relfrozenxid <> 0 ORDER BY age(relfrozenxid) DESC LIMIT 1;
- Oldest backend xmin
  SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_activity WHERE 
backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
- oldest replication xmin:
  SELECT pid, backend_xmin, age(backend_xmin) FROM pg_stat_replication WHERE 
backend_xmin <> 0 ORDER BY age(backend_xmin) DESC LIMIT 3;
- oldest slot xmin:
  SELECT slot_name, xmin, age(xmin), catalog_xmin, age(catalog_xmin) FROM 
pg_replication_slots WHERE xmin <> 0 OR catalog_xmin <> 0 ORDER BY 
greatest(age(xmin), age(catalog_xmin)) DESC LIMIT 3;
- oldest prepared transaction (unfortunately xmin not available)
  SELECT gid, database, transaction FROM pg_prepared_xacts ORDER BY 
age(transaction) LIMIT 3;


> The same bench was run against pg11 many times and never triggered this issue.
> So far our best guess is a side effect of 2aa6e331ead7.

In that case you'd likely see DEBUG1 output, right? Did you try running
with that?


> Michael and I have been trying to reproduce this issue locally (drastically
> reducing the various freeze_age parameters) for hours, but no luck for now.

Have you considered using gdb to investigate?


> This is using a vanilla pg 12.1, with some OLTP workload.  The only possibly
> relevant configuration changes are quite aggressive autovacuum settings on 
> some
> tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum scale
> factor to 0, for both heap and toast).

That, uh, is not an insignificant set of changes for an autovac
scheduling issues.

It sounds like what might be happening is that you have something
holding back the "oldest needed transaction" horizon. Before
2aa6e331ead7, if the xmin horizon hasn't increased, there'd be repeated
vacuums unable to increase the horizon, but afterwards they'll all
immediately exit without any messages.

I wonder if what might be happening is that we're somehow missed/failed
to update relfrozenxid and/or datfrozenxid. If you manually vacuum some
table in the oldest database, but that is *NOT* the oldest table itself,
does the problem "resolve" itself?

Greetings,

Andres Freund




Re: Runtime partition pruning

2020-03-23 Thread Michael Lewis
>
> select * from test where id between client_id-10 and client_id+10  and
> client_id=?;
>
> does not (it scans all partitions in parallel) .
> Is it expected?
>

Yes. But the below would work fine I expect since the planner would know a
constant range for id. I would be very surprised if the optimizer had some
condition rewrite rules to handle just the scenario you show.

 select * from test where id between ?-10 and ?+10  and client_id=?;


Re: Loading 500m json files to database

2020-03-23 Thread Adrian Klaver

On 3/23/20 3:24 AM, pinker wrote:

Hi, do you have maybe idea how to make loading process faster?

I have 500 millions of json files (1 json per file) that I need to load to
db.
My test set is "only" 1 million files.

What I came up with now is:

time for i in datafiles/*; do
   psql -c "\copy json_parts(json_data) FROM $i"&
done

which is the fastest so far. But it's not what i expect. Loading 1m of data
takes me ~3h so loading 500 times more is just unacceptable.


Aggregating the JSON files as others have suggested would help greatly.

Knowing what is happening in json_parts() might help folks provide 
further tips.





some facts:
* the target db is on cloud so there is no option to do tricks like turning
fsync off
* version postgres 11
* i can spin up huge postgres instance if necessary in terms of cpu/ram
* i tried already hash partitioning (to write to 10 different tables instead
of 1)


Any ideas?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html





--
Adrian Klaver
adrian.kla...@aklaver.com




Re: PG12 autovac issues

2020-03-23 Thread Julien Rouhaud
On Fri, Mar 20, 2020 at 12:03:17PM -0700, Andres Freund wrote:
> Hi,
> 
> On 2020-03-20 12:42:31 -0500, Justin King wrote:
> > When we get into this state again, is there some other information
> > (other than what is in pg_stat_statement or pg_stat_activity) that
> > would be useful for folks here to help understand what is going on?
> 
> If it's actually stuck on a single table, and that table is not large,
> it would be useful to get a backtrace with gdb.

FTR, we're facing a very similar issue at work (adding Michael and Kevin in Cc)
during performance tests since a recent upgrade to pg12 .

What seems to be happening is that after reaching 200M transaction a first pass
of autovacuum freeze is being run, bumping pg_database.darfrozenxid by ~ 800k
(age(datfrozenxid) still being more than autovacuum_freeze_max_age afterwards).
After that point, all available information seems to indicate that no
autovacuum worker is scheduled anymore:

- log_autovacuum_min_duration is set to 0 and no activity is logged (while
  having thousands of those per hour before that)
- 15 min interval snapshot of pg_database and pg_class shows that
  datfrozenxid/relfrozenxid keeps increasing at a consistent rate and never
  goes down
 - 15 min interval snapshot of pg_stat_activity doesn't show any autovacuum
   worker
- the autovacuum launcher is up and running and doesn't show any sign of
  problem
- n_mod_since_analyze keeps growing at a consistent rate, never going down
- 15 min delta of tup_updated and tup_deleted shows that the globate write
  activity doesn't change before and after the autovacuum problem

The situation continues for ~2h, at which point the bloat is so heavy that the
main filesystem becomes full, and postgres panics after a failed write in
pg_logical directory or similar.

The same bench was run against pg11 many times and never triggered this issue.
So far our best guess is a side effect of 2aa6e331ead7.

Michael and I have been trying to reproduce this issue locally (drastically
reducing the various freeze_age parameters) for hours, but no luck for now.

This is using a vanilla pg 12.1, with some OLTP workload.  The only possibly
relevant configuration changes are quite aggressive autovacuum settings on some
tables (no delay, analyze/vacuum threshold to 1k and analyze/vacuum scale
factor to 0, for both heap and toast).




Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent




On 3/23/20 4:24 AM, pinker wrote:

Hi, do you have maybe idea how to make loading process faster?

I have 500 millions of json files (1 json per file) that I need to load to
db.
My test set is "only" 1 million files.

What I came up with now is:

time for i in datafiles/*; do
   psql -c "\copy json_parts(json_data) FROM $i"&
done

which is the fastest so far. But it's not what i expect. Loading 1m of data
takes me ~3h so loading 500 times more is just unacceptable.

some facts:
* the target db is on cloud so there is no option to do tricks like turning
fsync off
* version postgres 11
* i can spin up huge postgres instance if necessary in terms of cpu/ram
* i tried already hash partitioning (to write to 10 different tables instead
of 1)


Any ideas?


Most advanced languages have a bulk copy implementation.  I've found this to be 
blindingly fast when the receiving table has no indices, constraints.  It's not clear how 
large your files are, but you might take this time to "normalized" them: 
extract any id, datatype, etc into table attributes.


--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html








Runtime partition pruning

2020-03-23 Thread Radu Radutiu
Hello list,

Is runtime partition pruning available in PostgreSQL 12?

I have a table partitioned by range on column id (primary  key).
For the query (client_id is passed as a parameter from the application):

select * from test  where id between>0 and  and id<1000 and client_id=? ;

partition pruning works fine.

However

select * from test where id between client_id-10 and client_id+10  and
client_id=?;

does not (it scans all partitions in parallel) .
Is it expected?

Regards,
Radu


Re: Loading 500m json files to database

2020-03-23 Thread Christopher Browne
On Mon, 23 Mar 2020 at 06:24, pinker  wrote:

> Hi, do you have maybe idea how to make loading process faster?
>
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
>
> What I came up with now is:
>
> time for i in datafiles/*; do
>   psql -c "\copy json_parts(json_data) FROM $i"&
> done
>
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
>
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables
> instead
> of 1)
>
>
> Any ideas?
>


Well, you're paying for a lot of overhead in that, as you're
establishing a psql command, connecting to a database, spawning a backend
process, starting a transactions, committing a transaction, closing the
backend
process, disconnecting from the database, and cleaning up after the
launching
of the psql command.  And you're doing that 500 million times.

The one thing I left off that was the loading of a single tuple into
json_parts.

What you could do to improve things quite a lot would be to group some
number
of those files together, so that each time you pay for the overhead, you at
least
get the benefit of loading several entries into json_parts.

So, loosely, I'd commend using /bin/cat (or similar) to assemble several
files together
into one, and then \copy that one file in.

Having 2 tuples loaded at once drops overhead by 50%
Having 10 tuples loaded at once drops overhead by 90%
Having 100 tuples loaded at once drops overhead by 99%
Having 1000 tuples loaded at once drops overhead by 99.9%

There probably isn't too much real value to going past 1000 tuples per
batch; the
overhead, by that point, is getting pretty immaterial.

Reducing that overhead is the single most important thing you can do.

It is also quite likely that you could run such streams in parallel,
although
it would require quite a bit more information about the I/O capabilities of
your
hardware to know if that would do any good.
-- 
When confronted by a difficult problem, solve it by reducing it to the
question, "How would the Lone Ranger handle this?"


Re: Passwordcheck configuration

2020-03-23 Thread Laurenz Albe
On Fri, 2020-03-20 at 12:30 -0400, Dave Hughes wrote:
> Thank you for the information!  This issue originated from a Department of 
> Defense STIG
> (Security Technical Implementation Guides).  It's a security check that 
> applications
> and databases have to go through.  I'll just leave this one as a "finding" 
> since there
> isn't a way to really configure it to their requirements.  

Our traditional answer is that for high security standards, you shouldn't use
passwords in the database, but some external authentication method like
Kerberos.  Then you can enforce the password restrictions there.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com





Re: Loading 500m json files to database

2020-03-23 Thread Rob Sargent



> On Mar 23, 2020, at 5:59 AM, Andrei Zhidenkov  
> wrote:
> 
> Try to write a stored procedure (probably pl/python) that will accept an 
> array of JSON objects so it will be possible to load data in chunks (by 
> 100-1000 files) which should be faster.
> 
>>> On 23. Mar 2020, at 12:49, Ertan Küçükoğlu  
>>> wrote:
>>> 
>>> 
 On 23 Mar 2020, at 13:20, pinker  wrote:
>>> 
>>> Hi, do you have maybe idea how to make loading process faster?
>>> 
>>> I have 500 millions of json files (1 json per file) that I need to load to
>>> db.
>>> My test set is "only" 1 million files.
>>> 
>>> What I came up with now is:
>>> 
>>> time for i in datafiles/*; do
>>> psql -c "\copy json_parts(json_data) FROM $i"&
>>> done
>>> 
>>> which is the fastest so far. But it's not what i expect. Loading 1m of data
>>> takes me ~3h so loading 500 times more is just unacceptable.
>>> 
>>> some facts:
>>> * the target db is on cloud so there is no option to do tricks like turning
>>> fsync off
>>> * version postgres 11
>>> * i can spin up huge postgres instance if necessary in terms of cpu/ram
>>> * i tried already hash partitioning (to write to 10 different tables instead
>>> of 1)
>>> 
>>> 
>>> Any ideas?
>> Hello,
>> 
>> I may not be knowledge enough to answer your question.
>> 
>> However, if possible, you may think of using a local physical computer to do 
>> all uploading and after do backup/restore on cloud system.
>> 
>> Compressed backup will be far less internet traffic compared to direct data 
>> inserts.
>> 
>> Moreover you can do additional tricks as you mentioned.
>> 
>> Thanks & regards,
>> Ertan
>> 
>> 

Drop any and all indices

>> 
>> 
> 
> 
> 




Re: Loading 500m json files to database

2020-03-23 Thread Andrei Zhidenkov
Try to write a stored procedure (probably pl/python) that will accept an array 
of JSON objects so it will be possible to load data in chunks (by 100-1000 
files) which should be faster.

> On 23. Mar 2020, at 12:49, Ertan Küçükoğlu  
> wrote:
> 
> 
>> On 23 Mar 2020, at 13:20, pinker  wrote:
>> 
>> Hi, do you have maybe idea how to make loading process faster?
>> 
>> I have 500 millions of json files (1 json per file) that I need to load to
>> db.
>> My test set is "only" 1 million files.
>> 
>> What I came up with now is:
>> 
>> time for i in datafiles/*; do
>> psql -c "\copy json_parts(json_data) FROM $i"&
>> done
>> 
>> which is the fastest so far. But it's not what i expect. Loading 1m of data
>> takes me ~3h so loading 500 times more is just unacceptable.
>> 
>> some facts:
>> * the target db is on cloud so there is no option to do tricks like turning
>> fsync off
>> * version postgres 11
>> * i can spin up huge postgres instance if necessary in terms of cpu/ram
>> * i tried already hash partitioning (to write to 10 different tables instead
>> of 1)
>> 
>> 
>> Any ideas?
> Hello,
> 
> I may not be knowledge enough to answer your question.
> 
> However, if possible, you may think of using a local physical computer to do 
> all uploading and after do backup/restore on cloud system.
> 
> Compressed backup will be far less internet traffic compared to direct data 
> inserts.
> 
> Moreover you can do additional tricks as you mentioned.
> 
> Thanks & regards,
> Ertan
> 
> 
> 
> 





Re: Loading 500m json files to database

2020-03-23 Thread Ertan Küçükoğlu


> On 23 Mar 2020, at 13:20, pinker  wrote:
> 
> Hi, do you have maybe idea how to make loading process faster?
> 
> I have 500 millions of json files (1 json per file) that I need to load to
> db.
> My test set is "only" 1 million files.
> 
> What I came up with now is:
> 
> time for i in datafiles/*; do
>  psql -c "\copy json_parts(json_data) FROM $i"&
> done
> 
> which is the fastest so far. But it's not what i expect. Loading 1m of data
> takes me ~3h so loading 500 times more is just unacceptable.
> 
> some facts:
> * the target db is on cloud so there is no option to do tricks like turning
> fsync off
> * version postgres 11
> * i can spin up huge postgres instance if necessary in terms of cpu/ram
> * i tried already hash partitioning (to write to 10 different tables instead
> of 1)
> 
> 
> Any ideas?
Hello,

I may not be knowledge enough to answer your question.

However, if possible, you may think of using a local physical computer to do 
all uploading and after do backup/restore on cloud system.

Compressed backup will be far less internet traffic compared to direct data 
inserts.

Moreover you can do additional tricks as you mentioned.

Thanks & regards,
Ertan






Re: Postgres cluster setup

2020-03-23 Thread Ravi Krishna
> 
> Do you have similar setup like Oracle RAC in postgres core . I found in edb 
> but didn't find anything in postgres core. We are looking for setting up 
> replication with no outage and other node will be up , if the primary is 
> down. Any help would be great 

Oracle RAC is based on shared storage and not replication.  Are you talking 
about Dataguard ?

Loading 500m json files to database

2020-03-23 Thread pinker
Hi, do you have maybe idea how to make loading process faster?

I have 500 millions of json files (1 json per file) that I need to load to
db.
My test set is "only" 1 million files.

What I came up with now is:

time for i in datafiles/*; do
  psql -c "\copy json_parts(json_data) FROM $i"&
done

which is the fastest so far. But it's not what i expect. Loading 1m of data
takes me ~3h so loading 500 times more is just unacceptable.

some facts:
* the target db is on cloud so there is no option to do tricks like turning
fsync off
* version postgres 11
* i can spin up huge postgres instance if necessary in terms of cpu/ram
* i tried already hash partitioning (to write to 10 different tables instead
of 1)


Any ideas?



--
Sent from: https://www.postgresql-archive.org/PostgreSQL-general-f1843780.html