Re: Reindex "locked" standby database

2021-12-14 Thread Martín Fernández



> On 15 Dec 2021, at 00:52, Mladen Gogala  wrote:
> 
> On 12/14/21 22:37, Michael Paquier wrote:
>> You are referring to the startup process that replays WAL, right?
>> Without having an idea about the type of workload your primary and/or
>> standbys are facing, as well as an idea of the configuration you are
>> using on both (hot_standby_feedback for one), I have no direct idea,
>> but that could be a conflict caused by a concurrent vacuum.
> 
> Hi Michael,
> 
> I am preparing for a standby deployment. I don't have a standby yet and, 
> therefore, I don't have any standby problems. Would it be advisable to turn 
> vacuum off on the standby? Applying WAL will also, in theory, populate the 
> statistics which is also held in the database blocks.

Take this with grain of salt since I’m far from being an expert :) , just 
trying to help. To my knowledge, assuming you would be running a physical 
standby, vacuum operations wouldn’t run there, only in the primary. Changes 
would get propagated via physical replication to your standby (blocks that 
change due to vacuuming on the primary). 

Hope that helps.

> 
> Regards
> 
> -- 
> Mladen Gogala
> Database Consultant
> Tel: (347) 321-1217
> https://dbwhisperer.wordpress.com
> 
> 
> 





Re: Reindex "locked" standby database

2021-12-14 Thread Martín Fernández
Micheal,

Thanks for much for the quick response.

> On 15 Dec 2021, at 00:37, Michael Paquier  wrote:
> 
> On Wed, Dec 15, 2021 at 12:15:27AM -0300, Martín Fernández wrote:
>> The reindex went fine in the primary database and in one of our
>> standby. The other standby that we also operate for some reason
>> ended up in a state where all transactions were locked by the WAL
>> process and the WAL process was not able to make any progress. In
>> order to solve this issue we had to move traffic from the “bad”
>> standby to the healthy one and then kill all transactions that were
>> running in the “bad” standby. After that, replication was able to
>> resume successfully.
> 
> You are referring to the startup process that replays WAL, right?
That is correct, I’m talking about the startup process that replays the WAL 
files.

> Without having an idea about the type of workload your primary and/or
> standbys are facing, as well as an idea of the configuration you are
> using on both (hot_standby_feedback for one), I have no direct idea,

Primary handle IOT data ingestion. The table that we had to REINDEX gets 
updated every time a new message arrives in the system so updated are happening 
very often on that table, thus, the index/table bloat. The standby at any point 
in time would be receiving queries that would take advantage of the indexes 
that were being re indexed.  hot_standby_feedback is currently turned OFF on 
the standbys. 

> but that could be a conflict caused by a concurrent vacuum.


> 
> Seeing where things got stuck could also be useful, perhaps with a
> backtrace of the area where it happens and some information around
> it.
> 
>> I’m just trying to understand what could have caused this issue. I
>> was not able to identify any queries in the standby that would be
>> locking the WAL process. Any insight would be more than welcome!
> 
> That's not going to be easy without more information, I am afraid.
> --
> Michael



Reindex "locked" standby database

2021-12-14 Thread Martín Fernández
Hello pg hackers!

Today we had to run a `REINDEX table CONCURRENTLY my_table;` in our production 
database due to considerable index bloat. We used to deal with this problem in 
the past by using pg_repack but we stopped using it because our data 
replication tool doesn’t support “re creating” tables in the way that pg_repack 
does it (we are using Fivetran).

The reindex went fine in the primary database and in one of our standby. The 
other standby that we also operate for some reason ended up in a state where 
all transactions were locked by the WAL process and the WAL process was not 
able to make any progress. In order to solve this issue we had to move traffic 
from the “bad” standby to the healthy one and then kill all transactions that 
were running in the “bad” standby. After that, replication was able to resume 
successfully. 

I’m just trying to understand what could have caused this issue. I was not able 
to identify any queries in the standby that would be locking the WAL process. 
Any insight would be more than welcome!

We are running pg12 in our cluster and standbys are replicating using physical 
replication lots.

Best,
Martín



Re: Logical Replication, CPU load and Locking contention

2021-03-10 Thread Martín Fernández


> On 10 Mar 2021, at 11:25, Ron  wrote:
> 
> On 3/10/21 2:10 AM, Radoslav Nedyalkov wrote:
>> 
>> On Wed, Mar 10, 2021 at 3:56 AM Martín Fernández > <mailto:fmarti...@gmail.com>> wrote:
>> Hello,
>> 
>> I’m troubleshooting a problem at my company with a pg 12 cluster that we run.
>> 
>> We are using Amazon DMS to replicate data from our database into S3 buckets. 
>>  DMS replicates data by using logical replication slots. 
>> 
>> After introducing DMS in our environment, we have seen an increase in CPU 
>> load of 20 points at business hours (from ~60% to ~80%).
>> 
>> The other thing that we have identified is that AccessShareLocks increase 
>> considerably when DMS running.
>> 
>> Based on this information, I’m trying to understand if this is something 
>> expected when running logical replication or not. We’ve been running 
>> physical replication for several years and we haven’t seen nothing like 
>> this. It could be the case that the issue is not related at all with logical 
>> replication and is purely a DMS artifact. 
>> 
>> Thanks before hand!
>> 
>> Best,
>> Martín
>> 
>> 
>> Hi,
>> 
>> I would check in pg_stat_activity what those logical replication slots do. I 
>> guess COPY.
> 
> If it's a full-load or full-load-and-cdc, then it's almost certainly a COPY.

We are doing full-load and full-load-and-cdc. At this point it’s just cdc since 
the full load was done month ago.

> 
>> Are you doing one shot copy ? every day ? Then copying all the tables will 
>> lead to load increase. 
>> How many tables at  a time DMS copies? It should be configurable.
> 
> It definitely is, with the MaxFullLoadSubTasks parameter.  The default is 8.

We are copying a lot of tables. 100+

> 
>> AccessShareLock is absolutely normal. You have a transaction doing SELECT 
>> (COPY) over a table.
> 
> If DMS with Postgresql as the source is anything like when Oracle is the 
> source (we're testing Oracle -> RDS Postgresql) then it starts a SERIALIZABLE 
> transaction.

My guess at this point is that the CPU load increases due to the increase of 
AccessShareLocks, is that a fair assumption ? Have you seen a similar behavior 
in Oracle ?

> 
>> Physical replication is a different thing. It happens at another level.
>> 
>> Regards
>> Rado 
> 
> -- 
> Angular momentum makes the world go 'round.



Logical Replication, CPU load and Locking contention

2021-03-09 Thread Martín Fernández
Hello,

I’m troubleshooting a problem at my company with a pg 12 cluster that we run.

We are using Amazon DMS to replicate data from our database into S3 buckets.  
DMS replicates data by using logical replication slots. 

After introducing DMS in our environment, we have seen an increase in CPU load 
of 20 points at business hours (from ~60% to ~80%).

The other thing that we have identified is that AccessShareLocks increase 
considerably when DMS running.

Based on this information, I’m trying to understand if this is something 
expected when running logical replication or not. We’ve been running physical 
replication for several years and we haven’t seen nothing like this. It could 
be the case that the issue is not related at all with logical replication and 
is purely a DMS artifact. 

Thanks before hand!

Best,
Martín



Logical Replication and table bloat

2020-06-05 Thread Martín Fernández
Hello,

Yesterday we stumbled upon a performance issue that we were not expecting. We 
are replicating our database using AWS DMS which uses logical replication to 
capture changes. We have some hot tables that get updated very regularly and 
with the DMS turned on we started noticing that in those table, table bloat 
increased considerably ~15 times more free_tuples than the average. 

When doing logical replication, the subscriber will hold the tuples that could 
be flagged for reuse until they are sent ? Just trying to understand a little 
bit better how the logical replication is affecting the vacuuming.

Thanks before hand!

Best,
Martín



Upgrade standby after starting cluster using rsync

2019-03-11 Thread Martín Fernández
Hello,

I've wrote a couple of questions around pg_upgrade and updating standbys using 
rsync last week. We were able to successfully upgrade half of our cluster (the 
other half was kept for failover) from pg92 with postgis 1.5.8 to pg10 with 
postgis 2.4. It was a really interesting challenge because of postgis binary 
incompatibility for geometry data types.

The rsync call that we used looked exactly like this (taken from pg_upgrade man 
page basically):

`rsync --verbose --verbose --progress --archive --delete --hard-links 
--size-only  --no-inc-recursive /var/lib/postgres/9.2 /var/lib/postgres/10 
$REPLICA_IP:/var/lib/postgres`

We are now in the journey of upgrading the other half of the cluster since we 
have concluded that the upgrade was successful. 

We are planning on using the same rsync call to upgrade the rest of the 
standbys (in combination with pg_start_backup/pg_stop_backup low level api). My 
only concern is that I'm not 100% sure if the `--size-only` flag will be enough 
to guarantee that files are the same. On the initial set of standbys that we 
upgraded this shouldn't generate an issue since the standbys were at the same 
last checkpoint than the master  and we did the rsync call before starting the 
primary (after running pg_upgrade).

Is there any potential issues that could show up if we do it with --size-only ? 
Should we use the default rsync mechanism that would check for size and 
timestamps ?

Hoping someone has some better experience than me on upgrading standbys using 
rsync.

Thanks for all the help as usual!

Best,

Martín

VACUUM FREEZE and replication lag

2019-03-04 Thread Martín Fernández
Hello everyone,

We have a very big table in our pg92 database that requires a manual vacuum 
freeze in order to keep sane number of transaction ids available. Yesterday we 
did a vacuum freeze on this table that took roughly 9 hours. After performing 
the operation we got back roughly 0.5 billion transaction ids. I was wondering 
how much this operation can affect replication lag? If vacuum freeze needs to 
mark a lot of tuples to the xfrozenid, will this mean that a lot of informaton 
is going to be stremead in the way files? Just trying understand the relation. 
As soon as we started the operation, our standbys that have the lower io got 
significantly impacted for several hours. 

Thanks!

Martín

Re: PG Upgrade with hardlinks, when to start/stop master and replicas

2019-02-22 Thread Martín Fernández
On Fri, Feb 22, 2019 at 2:03 AM Tom Lane  wrote:

> Bruce Momjian  writes:
> > On Thu, Feb 21, 2019 at 09:31:32PM -0500, Stephen Frost wrote:
> >> * Bruce Momjian (br...@momjian.us) wrote:
> >>> There was too much concern that users would accidentally start the old
> >>> server at some later point, and its files would be hard linked to the
> >>> new live server, leading to disaster.
>
>
I think this is a great solution. Knowing that neither a human nor a
supervisor can mess up the hardlinks is something I really appreciate.


> >> Sure, I understand that concern, just wish there was a better approach
> >> we could use for "DO NOT START THIS SERVER" rather than moving of the
> >> pg_control file.
>
> > As ugly as it is, I have never heard of a better solution.
>
> system("rm -rf $OLDPGDATA") ... nah, that is not a better idea.
>
> regards, tom lane
>


Re: PG Upgrade with hardlinks, when to start/stop master and replicas

2019-02-19 Thread Martín Fernández
Stephen,

@bilby91


On Tue, Feb 19, 2019 at 1:37 PM Stephen Frost  wrote:

> Greetings,
>
> * Martín Fernández (fmarti...@gmail.com) wrote:
> > Thanks for information! I've refactor our migration scripts to follow
> the suggestions.
>
> Please don't top-post on these mailing lists.
>
> > One extra question that popped up. As long as we don't start the standby
> (after running rsync), we can always `rm -f $PGDATA_10` and promote the
> standby if necessary for failover right ? We also need to `mv`
> pg_control.old to pg_control in the old data directory.
>
> Not sure which standby we're talking about here, but in general, yes, as
> long as you haven't actually started the system after the
> pg_upgrade/rsync, you should be able to blow away the new cluster that
> pg_upgrade/rsync created and start the old cluster back up and promote
> it (if necessary) and use it.
>
> Note that you should *not* need to do anything with pg_control, I have
> no idea what you're referring to there, but the old cluster should have
> the pg_control file and all the catalog tables in place from before the
> pg_upgrade/rsync (those aren't touched during the pg_upgrade/rsync
> process) and you would just need to start up the old binaries pointing
> at the old PG data directory and everything should just work.
>
>
I did some successful tests yesterday around this scenario. That standby in
this context is that one that received the rsync from the master but was
never started. The old data directory stays intact except for the fact that
globa/pg_control was renmaed with a .old

I have found the documentation on pg_ugprade that states this:

` If you ran pg_upgrade without --link or did not start the new server, the
old cluster was not modified except that, if linking started, a .old suffix
was appended to
   $PGDATA/global/pg_control. To reuse the old cluster,
possibly remove the .old suffix from $PGDATA/global/pg_control; you can
then restart the old cluster.`


> Thanks!
>
> Stephen
>


Re: PG Upgrade with hardlinks, when to start/stop master and replicas

2019-02-19 Thread Martín Fernández
Stephen,

Thanks for information! I've refactor our migration scripts to follow the 
suggestions. 

One extra question that popped up. As long as we don't start the standby (after 
running rsync), we can always `rm -f $PGDATA_10` and promote the standby if 
necessary for failover right ? We also need to `mv` pg_control.old to 
pg_control in the old data directory.

Thanks!

Martín

On Tue, Feb 19th, 2019 at 12:39 PM, Hellmuth Vargas  wrote:

> 
> Thank you Stephen 
> 
> El mar., 19 de feb. de 2019 a la(s) 10:05, Stephen Frost ( sfr...@snowman.net
> ) escribió:
> 
> 
>> Greetings,
>> 
>> * Hellmuth Vargas ( hiv...@gmail.com ) wrote:
>> > But could you do the following procedure?:
>> 
>> > pg_upgrade of the master
>> > rysnc with a hot standby
>> 
>> The above should be alright provided both the primary and the standby
>> are down and the instructions in the pg_upgrade docs are followed.
>> 
>> > arracar master
>> > hot standby start
>> 
>> So, start both the primary and the replica?  That part should be fine by
>> itself.
>> 
>> > stop hot standby and rsync the other hot standby with the migrated hot
>> > standby?
>> 
>> At some later point, shut down the replica completely, then do an rsync
>> from that replica to the other replica and build its hard-link tree that
>> way, and update anything that's changed while the 'migrated' replica was
>> online?  I don't see any obvious issue with that as the result should
>> mean that the two replicas are identical from PG's perspective from that
>> point moving forward.
>> 
>> Ultimately, it really depends on your specific environment though, of
>> course.  It also might not be a bad idea to do a regular backup of the
>> upgraded primary and then restore that to the second replica, just to
>> make sure you have that whole process working and to test out your
>> restore process.
>> 
>> Thanks!
>> 
>> Stephen
>> 
> 
> 
> 
> 
> 
> --
> Cordialmente,
> 
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet 
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate
> 
> 
>

Re: PG Upgrade with hardlinks, when to start/stop master and replicas

2019-02-18 Thread Martín Fernández
Laurenz and Hellmuth, replying to both of you!

Thanks for the quick replies BTW!
Martín


On Mon, Feb 18, 2019 at 5:32 PM Hellmuth Vargas  wrote:

>
> Hola Martin
>
> Pues si uno sigue la secuencia de la pagina de ayuda de PostgreSQL
>
> https://www.postgresql.org/docs/10/pgupgrade.html
>
> Usage
> (...)
> 7. Stop both servers
> (...)
> 10. Upgrade Streaming Replication and Log-Shipping standby servers
> (...)
> 12. Start the new server
>
> *The new server can now be safely started, and then any rsync'ed standby
> servers.*
>
>
> Pensaría que toca esperar a que terminen todo los rsync de las replicas
> antes de iniciar la master... Pero tengo incluso una pregunta adicional: si
> tengo 2 o mas replicas.. se podria  sincronizar primero la master con una
> replica (rsync), iniciar la master y luego emplear la replica sincronizada
> para sincronizar el resto de replicas mientras que la máster ya se
> encuentra en funcionamiento?.
>

I asked myself that question as well. I'm trying to basically minimize the
amount of standbys that are going to be affected by the upgrade. The
documentation states that you can do something similar to what you
describe, the only thing that you need to make sure is to no start the
standby if you are going to use it as a source of rsync to another standby.
Would that make sense ?



>
> El lun., 18 de feb. de 2019 a la(s) 15:21, Laurenz Albe (
> laurenz.a...@cybertec.at) escribió:
>
>> Martín Fernández wrote:
>> > After reading the pg_upgrade documentation multiple times, it seems
>> that after running pg_upgrade on the primary instance, we can't start it
>> until we run rsync from the primary to the standby. I'm understanding this
>> from the following section in the pg_upgrade manual page.
>> >
>> > You will not be running pg_upgrade on the standby servers, but rather
>> rsync on the
>> >primary. Do not start any servers yet.
>>
>> Immediately following, you can read:
>>
>>  If you did not use link mode, do not have or do not want to use rsync,
>> or want an easier
>>  solution, skip the instructions in this section and simply recreate the
>> standby servers
>>  once pg_upgrade completes and the new primary is running.
>>
>> So this is not compulsory, it's just an efficient method to quickly get
>> the standby
>> server updated.
>>
>> There is nothing wrong with rebuilding the standby later.
>>
>>
I think that by `recreate` here we are talking about pg_basebackup right ?
That won't be acceptable because our database would take days to complete.
We need to use rsync and leverage the hardlinks.


> Yours,
>> Laurenz Albe
>> --
>> Cybertec | https://www.cybertec-postgresql.com
>>
>>
>>
>
> --
> Cordialmente,
>
> Ing. Hellmuth I. Vargas S.
> Esp. Telemática y Negocios por Internet
> Oracle Database 10g Administrator Certified Associate
> EnterpriseDB Certified PostgreSQL 9.3 Associate
>
>


PG Upgrade with hardlinks, when to start/stop master and replicas

2019-02-18 Thread Martín Fernández
Hello everyone!

We are about to upgrade a 6 instance cluster from pg92 to pg10 using pg_upgrade 
with hardlinks and rsync.

Our preliminary tests are working really good so far but on question has popped 
up that we feel is really critical because it has an important impact on our 
failover plan.

After reading the pg_upgrade documentation multiple times, it seems that after 
running pg_upgrade on the primary instance, we can't start it until we run 
rsync from the primary to the standby. I'm understanding this from the 
following section in the pg_upgrade manual page.

```
You will not be running pg_upgrade on the standby servers, but rather rsync on 
the
           primary. Do not start any servers yet.
```

I'm understanding the `any` as primary and standbys.

On the other hand, we've been doing tests that start the primary instance as 
soon as pg_upgrade is done. This tests have worked perfectly fine so far. We 
make the rsync call with the primary instance running and the standby can start 
later on after rsync is done and we copy the new configuration files.

If what we are doing is wrong, we need to run `rsync` before starting the 
primary instance, that would mean that the primary and the standby are not 
usable if pg10 doesn't start correctly in the primary right ? 

I hope my question is clear enough! 

Best,

Martín

Re: Promote replica before being able to accept connections

2019-02-11 Thread Martín Fernández
Laurenz,

That makes sense! 

Thanks for the quick reply

Best,

Martín

On Mon, Feb 11th, 2019 at 7:55 AM, Laurenz Albe  
wrote:

> 
> 
> 
> Martín Fernández wrote:
> > We have a 9.2 pg cluster and we are in the process of rebuilding a
> master database in our staging environment.
> > In order to achieve the latter goal, we are restoring our staging
> database using pg_basebackup against one
> > of our production replicas.
> >
> > pg_basebackup has completed and the staging database is configured in
> hot_standby mode, recovering from
> > the WAL archive. The database has not reached the point were it can
> accept connections yet, still
> > investing WAL files.
> >
> > What we are not sure about is, can we attempt promoting the database as
> a master instance before it can
> > accept connections ? We don´t care if the staging database is a few days
> behind the production environment,
> > that is something acceptable in our use case, we just want to promote it
> as soon as we can.
> 
> The earliest time to which you can recover is the end of the backup.
> 
> Until recovery has reached that point, the database is not consistent
> (that's the "consistent recovery state" the logs are talking about).
> 
> You cannot connect to the recovering database or promote it before you
> reach that point, so you have to wait until you can connectio to the
> database
> before you can promote it.
> 
> Yours,
> Laurenz Albe
> --
> Cybertec | https://www.cybertec-postgresql.com
> 
> 
> 
> 
>

Promote replica before being able to accept connections

2019-02-10 Thread Martín Fernández
Hello Everyone!

We have a 9.2 pg cluster and we are in the process of rebuilding a master
database in our staging environment. In order to achieve the latter goal,
we are restoring our staging database using pg_basebackup against one of
our production replicas.

pg_basebackup has completed and the staging database is configured in
hot_standby mode, recovering from the WAL archive. The database has not
reached the point were it can accept connections yet, still investing WAL
files.

What we are not sure about is, can we attempt promoting the database as a
master instance before it can accept connections ? We don´t care if the
staging database is a few days behind the production environment, that is
something acceptable in our use case, we just want to promote it as soon as
we can.

Thanks in advance!

Best,
Martín


Re: Transaction Id Space, Freezing and Wraparound

2018-11-20 Thread Martín Fernández
Martín

On Tue, Nov 20th, 2018 at 6:0 PM, Tom Lane  wrote:

> 
> 
> 
> =?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= < fmarti...@gmail.com > writes:
> > First thing that generated a lot of noise in my head was the following,
> if pg assigns contiguous numeric values for the txid, how does pg deal
> with fragmentation issues ? Then I later found that the txid space is
> actually circular and not linearly as I originally thought it was. This
> exposed me to the fact that the txid is actually exported as a 64bit value
> where the last 32bits are an epoch. My understanding is that the epoch is
> the component that allows the circularity of the data structure. I then
> started analyzing how pg decides if a given tuple is eligible for
> freezing.  I found out that pg will compare a cutoff_tx (I assume this is
> the last committed tx) with the xmin value of the given tuple, if xmin
> precedes the cutoff_tx the tuple is eligible (I’m ignoring
> HEAP_XMAX_IS_MULTI and HEAP_MOVED cases). Now, the xmin of a tuple is an
> 32 bit integer, so, how is the epoch part of an exported txid considered
> here ? What if we had a database really old where a txid with integer
> value 10 is greater than a txid of value 1000 ? 
> 
> Actually, XID epoch is an artifact that's bolted on for possible use by
> replication or what have you. So far as the core database is concerned,
> XIDs are 32 bits in a circular space, and the way that we deal with your
> question is we don't let the case arise. Every old tuple must be marked
> "frozen" before its XID gets to be 2 billion XIDs old; after that, we
> don't particularly care just how old it is. The whole "wraparound"
> business just exists to make sure that happens in time.
> 
> If the stored XIDs were 64 bits wide, we'd not have to bother with all
> of this mess ... but adding another 64 bits to tuple headers would be
> a painful space cost, not to mention the storage compatibility issues.
> 
> regards, tom lane
> 
> 
> 
> 

Tom,

Thanks for the insight!!

I got confused with the comment under "Transaction IDs and Snapshots"( 
https://www.postgresql.org/docs/current/functions-info.html ) "The internal 
transaction ID type (xid) is 32 bits wide and wraps around every 4 billion 
transactions. However, these functions export a 64-bit format that is extended 
with an "epoch" counter so it will not wrap around during the life of an 
installation. "

Transaction Id Space, Freezing and Wraparound

2018-11-20 Thread Martín Fernández
Hello everyone, second time writing to this awesome mailing list.

I’m helping manage a postgresql 9.2.24 high volume transaction database and 
yesterday we were literally one hour away of having to deal with transaction id 
wraparound. We were really lucky about identifying the issue one hour before 
getting materialized, that gave us some time to reduce the transaction load so 
that we could run VACUUM FREEZE on our hottest table. By running VACUUM FREEZE 
on our hottest table we were able to move our oldest txid by almost 1 billion. 

After almost experiencing transaction id wraparound I decided to start 
investigating deeper around how postgresql works with transactions ids and 
wraparound in general. 

First thing that generated a lot of noise in my head was the following, if pg 
assigns contiguous numeric values for the txid, how does pg deal with 
fragmentation issues ? Then I later found that the txid space is actually 
circular and not linearly as I originally thought it was. This exposed me to 
the fact that the txid is actually exported as a 64bit value where the last 
32bits are an epoch. My understanding is that the epoch is the component that 
allows the circularity of the data structure. I then started analyzing how pg 
decides if a given tuple is eligible for freezing.  I found out that pg will 
compare a cutoff_tx (I assume this is the last committed tx) with the xmin 
value of the given tuple, if xmin precedes the cutoff_tx the tuple is eligible 
(I’m ignoring HEAP_XMAX_IS_MULTI and HEAP_MOVED cases). Now, the xmin of a 
tuple is an 32 bit integer, so, how is the epoch part of an exported txid 
considered here ? What if we had a database really old where a txid with 
integer value 10 is greater than a txid of value 1000 ? 

I’m probably missing something here so bare with me if my previous explanation 
doesn’t make sense at all.

As usual, thanks before hand, any insight will be appreciated. 

PD: I based my research looking at 9.2.24 code base.

Best,
Martín

Re: Vacuum not deleting tuples when lockless

2018-09-15 Thread Martín Fernández
Tom & Jerry,

Thanks a lot for information!

On Monday (weekends don't have the same load patterns compared to business 
days) I will take a look at ` pg_prepared_xacts` that seems to expose Jerry's 
suggestion on xacts. Replication slots don't apply to 9.2.X from what I could 
investigate so I will discard that suggestion. 

Feedback setting (hot_standby_feedback) is turned off in all our replicas, this 
shouldn't be an issue from what I understood. 

Delay setting (vacuum_defer_cleanup_age ) in our master is configured to 0, , 
this shouldn't be an issue from what I understood. 

Thanks a lot!

Best,
Martín

On Fri, Sep 14th, 2018 at 11:29 PM, Jerry Sievers  
wrote:

> 
> 
> 
> Tom Lane < t...@sss.pgh.pa.us > writes:
> 
> > =?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= < fmarti...@gmail.com > writes:
> >
> >> We are experiencing some `vacuum` issues with a given table
> >> (potentially more). When a manual vacuum runs on the given table it
> >> seems that the `vacuum` process is not doing the expected cleanup.
> >
> >> DETAIL:  113257 dead row versions cannot be removed yet.
> >
> > Locks don't really have anything to do with that: what does matter is
> > how old is the oldest open transaction, because that determines the
> > "event horizon" that dead row versions have to fall below before they
> > can be removed. That oldest transaction might not be holding any locks
> > at the moment, but it doesn't matter, because in principle it could ask
> > to read this table later --- and it should see the table's contents as
> > of its snapshot.
> >
> > Serializable transactions are worse than repeatable-read transactions
> > for this purpose, because the former will keep a snapshot as of their
> > start time.
> >
> > As Jerry mentioned, replication slots can also act like open
> transactions
> > for this purpose, though I don't recall how much of that behavior is
> > present in 9.2.x.
> 
> Oops, didn't notice OP was on 9.2! Presume none, since I don't think we
> got rep slots till 9.4 :-)
> 
> >
> > regards, tom lane
> >
> >
> 
> --
> Jerry Sievers
> Postgres DBA/Development Consulting
> e: postgres.consult...@comcast.net
> p: 312.241.7800
> 
> 
> 
>

Vacuum not deleting tuples when lockless

2018-09-14 Thread Martín Fernández
Hello,

We are experiencing some `vacuum` issues with a given table (potentially more). 
When a manual vacuum runs on the given table it seems that the `vacuum` process 
is not doing the expected cleanup.

```
DETAIL:  113257 dead row versions cannot be removed yet.
```

I've been investigating the reasons for vacuum not being able to do it's work 
and I found that generally the problem is caused by open transactions 
referencing the dead tuples. I also found that locking can be a problem as well.

I did check that no long running transaction was happening and no locking was 
happening before running `vacuum` on the given table.

I used this query to check the locks:

```
SELECT query, state,locktype,mode
FROM pg_locks
JOIN pg_stat_activity
  USING (pid)
WHERE relation::regclass = 'my_table'::regclass
  AND granted IS TRUE;
```

I'm not sure where else to look for a potential issue that could be causing 
this problem. A few days back we had to use re_pack on the given table to solve 
our performance issues.

Important thing to clarify is that we are using postgresql 9.2.24

Thanks in advance!

Best,
Martín

Re: Behaviour when autovacuum is canceled

2018-09-14 Thread Martín Fernández
Alvaro,

Thanks for the insight, was really helpful!

Best,

Martín

On Fri, Sep 14th, 2018 at 12:41 PM, Alvaro Herrera  
wrote:

> 
> 
> 
> On 2018-Sep-13, Martín Fernández wrote:
> 
> > By performing this changes we are going to start relying more heavily
> > on the autovacuum work and the concern of "lost work" caused by
> > autovacuum canceling itself when locking contention happen showed up.
> > I'm guessing that we might be over thinking this and the canceling is
> > not going to happen as frequently as we think it will.
> 
> Any DDL run on a table will cancel an autovacuum over that table (except
> for-wraparound autovacuums). If these are rare, you don't need to worry
> about that too much. If they are frequent enough that autovacuum will
> be cancelled regularly in one table, you'll be sad.
> 
> If you're running vacuum by hand, you'd probably see your DDL blocking
> behind VACUUM, which would be very noticeable. I think if you don't
> have trouble today without having tuned the system carefully to avoid
> such trouble, you're not likely to have trouble with autovacuum either.
> 
> --
> Álvaro Herrera https://www.2ndQuadrant.com/
> PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
> 
> 
> 
>

Re: Behaviour when autovacuum is canceled

2018-09-13 Thread Martín Fernández
Tom,

Thanks for the detailed explanation. I can start mapping your explanation with 
the source code I've been reading :)

We are in the process of tuning our autovacuum settings (on some tables) and 
stop relying on crontabs that are performing manual vacuums. 

By performing this changes we are going to start relying more heavily on the 
autovacuum work and the concern of "lost work" caused by autovacuum canceling 
itself when locking contention happen showed up. I'm guessing that we might be 
over thinking this and the canceling is not going to happen as frequently as we 
think it will.

Martín

On Thu, Sep 13th, 2018 at 9:21 PM, Tom Lane  wrote:

> 
> 
> 
> =?UTF-8?q?Mart=C3=ADn_Fern=C3=A1ndez?= < fmarti...@gmail.com > writes:
> > We basically started a VACUUM on a given table, waited for one index to
> process (captured cleaned rows count) and cancel the VACUUM. When we run
> another VACUUM on the same table the dead rows removed from the first
> index was a number slightly higher than the value logged on the first
> VACUUM. This behaviour made us feel that the work done to clean dead
> tuples on the first index was performed again. 
> 
> The unit of work that doesn't have to be repeated if VACUUM is canceled
> is:
> 
> 1. Scan a bunch of heap pages to identify dead tuples;
> 2. Scan *all* the table's indexes to remove the corresponding index
> entries;
> 3. Rescan those heap pages to actually remove the tuples.
> 
> It sounds like you canceled partway through phase 2.
> 
> The actual size of this unit of work is the number of dead-tuple TIDs
> that will fit in maintenance_work_mem (at six or eight bytes apiece,
> I forget whether it's aligned...). Normally, people make
> maintenance_work_mem big so that they can reduce the number of index
> scan cycles needed to complete vacuuming a table. But if you're
> concerned about reducing the amount of work lost to a cancel,
> you might try *reducing* maintenance_work_mem. This will make
> vacuum slower overall (more index scans), but you have a better
> chance that it will manage to actually remove some tuples before
> getting canceled.
> 
> Or you could look at fixing the access patterns that are causing
> so many autovacuum cancels.
> 
> regards, tom lane
> 
> 
> 
>

Re: Behaviour when autovacuum is canceled

2018-09-13 Thread Martín Fernández
David,

Your last comment applies for cleaning up indexes as well ? We performed a 
simple test in our production database to understand behaviour and we got a 
result that surprised us based on your last comment.

We basically started a VACUUM on a given table, waited for one index to process 
(captured cleaned rows count) and cancel the VACUUM. When we run another VACUUM 
on the same table the dead rows removed from the first index was a number 
slightly higher than the value logged on the first VACUUM. This behaviour made 
us feel that the work done to clean dead tuples on the first index was 
performed again. 

Thanks!

Martín

On Thu, Sep 13th, 2018 at 8:0 PM, "Martín Fernández"  
wrote:

> 
> David,
> 
> 
> Thanks a lot for the quick reply. 
> 
> 
> I clearly misunderstood the references in the code. 
> 
> 
> Best,
> 
> Martín
> 
> 
> On Thu, Sep 13th, 2018 at 7:55 PM, "David G. Johnston" < 
> david.g.johns...@gmail.com
> > wrote:
> 
> 
>> 
>> On Thu, Sep 13, 2018 at 3:45 PM, Martín Fernández < fmarti...@gmail.com > 
>> wrote:
>> 
>> 
>>> From what I could understand (that can be totally wrong), the vacuum
>>> process is split in multiple small transactions. If the autovacuum is
>>> canceled, could it be possible that only the latest transaction work be
>>> lost 
>>> 
>> 
>> 
>> 
>> From the docs:
>> 
>> 
>> "VACUUM cannot be executed inside a transaction block."
>> 
>> 
>> As it is non-transactional any work it performs is live immediately and
>> irrevocably as it occurs.
>> 
>> 
>> David J.
>> 
>> 
>> 
> 
> 
> 
>

Re: Behaviour when autovacuum is canceled

2018-09-13 Thread Martín Fernández
David,

Thanks a lot for the quick reply. 

I clearly misunderstood the references in the code. 

Best,

Martín

On Thu, Sep 13th, 2018 at 7:55 PM, "David G. Johnston" 
 wrote:

> 
> 
> On Thu, Sep 13, 2018 at 3:45 PM, Martín Fernández < fmarti...@gmail.com > 
> wrote:
> 
> 
>> From what I could understand (that can be totally wrong), the vacuum
>> process is split in multiple small transactions. If the autovacuum is
>> canceled, could it be possible that only the latest transaction work be
>> lost 
>> 
> 
> 
> 
> From the docs:
> 
> 
> "VACUUM cannot be executed inside a transaction block."
> 
> 
> As it is non-transactional any work it performs is live immediately and
> irrevocably as it occurs.
> 
> 
> David J.
> 
> 
>

Behaviour when autovacuum is canceled

2018-09-13 Thread Martín Fernández
Hello,

I'm working on a high volume transaction database and we are starting to tune 
our autovacuum setting to improve our vacuuming performance.

Once thing that we know about autovacuum is that is can be automatically 
canceled if a dependent transaction is blocked by the autovacuum transaction. 
In a situation like this one, autovacuum would be canceled. 

My question is, what will happen to the work that autovacuum has achieved 
before being canceled ? Is that work lost ? I tried to look for the answer in 
the code and I'm not completely sure what will happen. From what I could 
understand (that can be totally wrong), the vacuum process is split in multiple 
small transactions. If the autovacuum is canceled, could it be possible that 
only the latest transaction work be lost ?

Sorry if my understanding is not the correct one, understanding how to tune the 
autovacuum seems really complicated to me.

Thanks before hand! 

Martín