Re: [HACKERS] File content logging during execution of COPY queries

2016-10-20 Thread Grigory Smolkin



On 10/20/2016 12:36 PM, Aleksander Alekseev wrote:

According to my colleagues it would be very nice to have this feature.
For instance, if you are trying to optimize PostgreSQL for application
that uses COPY and you don't have access to or something like this.
It could also be useful in some other cases.

This use-case doesn't really make much sense to me.  Can you explain it
in more detail?  Is the goal here to replicate all of the statements
that are changing data in the database?

The idea is to record application workload in real environment and write
a benchmark based on this record. Then using this benchmark we could try
different OS/DBMS configuration (or maybe hardware), find an extremum,
then change configuration in production environment.

It's not always possible to change an application or even database (e.g.
to use triggers) for this purpose. For instance, if DBMS is provided as
a service.

Currently PostgreSQL allows to record all workload _except_ COPY
queries. Considering how easily it could be done I think it's wrong.
Basically the only real question here is how it should look like in
postgresql.conf.

OK, how about introducing a new boolean parameter named log_copy?
Corresponding patch is attached.


This is a useful feature I was waiting for some time.
If some application which workload you want to collect is using COPY 
statement, then recording network traffic was your only option.


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



Re: [HACKERS] Fun fact about autovacuum and orphan temp tables

2016-09-05 Thread Grigory Smolkin


On 09/05/2016 04:34 PM, Alvaro Herrera wrote:

Grigory Smolkin wrote:


Funny part is that it never drops them. So when backend is finally
terminated, it tries to drop them and fails with error:

FATAL:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction

If I understand that rigth, we are trying to drop all these temp tables in
one transaction and running out of locks to do so.

Hmm, yeah, I suppose it does that, and it does seem pretty inconvenient.
It is certainly pointless to hold onto these locks for temp tables.  I
wonder how ugly would be to fix this problem ...



Thank you for your interest in this problem.
I dont think this is a source of problem. Ugly fix here would only force 
backend to terminate properly.

It will not help at all in cause of server crash or power outage.
We need a way to tell autovacuum, that we don`t need orphan temp tables, 
so they can be removed using existing routine.


The least invasive solution would be to have a guc, something like 
'keep_orphan_temp_tables' with boolean value.
Which would determine a autovacuum worker policy toward encountered 
orphan temp tables.


--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company



[HACKERS] Fun fact about autovacuum and orphan temp tables

2016-09-05 Thread Grigory Smolkin

Hello, hackers!

We were testing how well some application works with PostgreSQL and 
stumbled upon an autovacuum behavior which I fail to understand.
Application in question have a habit to heavily use temporary tables in 
funny ways.

For example it creates A LOT of them.
Which is ok.
Funny part is that it never drops them. So when backend is finally 
terminated, it tries to drop them and fails with error:


FATAL:  out of shared memory
HINT:  You might need to increase max_locks_per_transaction

If I understand that rigth, we are trying to drop all these temp tables 
in one transaction and running out of locks to do so.
After that postgresql.log is flooded at the rate 1k/s with messages like 
that:


LOG:  autovacuum: found orphan temp table "pg_temp_15"."tt38147" in 
database "DB_TEST"


It produces a noticeable load on the system and it`s getting worst with 
every terminated backend or restart.
I did some RTFS and it appears that autovacuum has no intention of 
cleaning that orphan tables unless

it`s wraparound time:

src/backend/postmaster/autovacuum.c
 /* We just ignore it if the owning backend is still active */
 2037 if (backendID == MyBackendId || 
BackendIdGetProc(backendID) == NULL)

 2038 {
 2039 /*
 2040  * We found an orphan temp table (which was 
probably left
 2041  * behind by a crashed backend).  If it's so old 
as to need
 2042  * vacuum for wraparound, forcibly drop it. 
Otherwise just

 2043  * log a complaint.
 2044  */
 2045 if (wraparound)
 2046 {
 2047 ObjectAddress object;
 2048
 2049 ereport(LOG,
 2050 (errmsg("autovacuum: dropping orphan 
temp table \"%s\".\"%s\" in database \"%s\"",

 2051 get_namespace_name(classForm->relnamespace),
 2052 NameStr(classForm->relname),
 2053 get_database_name(MyDatabaseId;
 2054 object.classId = RelationRelationId;
 2055 object.objectId = relid;
 2056 object.objectSubId = 0;
 2057 performDeletion(, DROP_CASCADE, 
PERFORM_DELETION_INTERNAL);

 2058 }
 2059 else
 2060 {
 2061 ereport(LOG,
 2062 (errmsg("autovacuum: found orphan 
temp table \"%s\".\"%s\" in database \"%s\"",

 2063 get_namespace_name(classForm->relnamespace),
 2064 NameStr(classForm->relname),
 2065 get_database_name(MyDatabaseId;
 2066 }
 2067 }
 2068 }


What is more troubling is that pg_statistic is starting to bloat badly.

LOG:  automatic vacuum of table "DB_TEST.pg_catalog.pg_statistic": index 
scans: 0

pages: 0 removed, 68225 remain, 0 skipped due to pins
tuples: 0 removed, 2458382 remain, 2408081 are dead but not yet 
removable

buffer usage: 146450 hits, 31 misses, 0 dirtied
avg read rate: 0.010 MB/s, avg write rate: 0.000 MB/s
system usage: CPU 3.27s/6.92u sec elapsed 23.87 sec

What is the purpose of keeping orphan tables around and not dropping 
them on the spot?



--
Grigory Smolkin
Postgres Professional: http://www.postgrespro.com
The Russian Postgres Company