PostgreSQL Database Upgrade

2020-12-07 Thread github kran
Hello PostgreSQL Team,

We have recently upgraded the postgreSQL version from V9 to V10.

   - Is the below command right to update the database statistics after the
   database upgrade ?.
   - How do I verify in PG_STATS if the Analyze ran successfully? Does it
   show any time when the last Analyze successfully completed.?

Command used:
*ANALYZE VERBOSE;*


Appreciate your help, thank you !!


Re: AutoVacuum and growing transaction XID's

2020-05-12 Thread github kran
Thanks for yous suggestions Michael  and David.

On Fri, May 8, 2020 at 4:11 PM Michael Lewis  wrote:

> autovacuum_naptime being only 5 seconds seems too frequent. A lock_timeout
> might be 1-5 seconds depending on your system. Usually, DDL can fail and
> wait a little time rather than lock the table for minutes and have all
> reads back up behind the DDL.
>
> Given you have autovacuum_vacuum_cost_limit set to unlimited (seems very
> odd), I'm not sure a manual vacuum freeze command on the tables with high
> age would perform differently. Still, issuing a vacuum freeze and then
> killing the autovacuum process might be worth trying.
>


Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread github kran
On Thu, May 7, 2020 at 11:04 PM David Rowley  wrote:

> On Fri, 8 May 2020 at 13:51, github kran  wrote:
> >   I can't either DROP or ALTER any other tables ( REMOVE Inheritance
> for any of old tables where the WRITES are not getting written to). Any of
> the ALTER TABLE OR DROP TABLE  DDL's arer not getting exeucted even I
> WAITED FOR SEVERAL MINUTES , so I have terminated those queries as I didn't
> have luck.
>
> The auto-vacuum freeze holds an SharedUpdateExclusiveLock on the table
> being vacuumed. If you try any DDL that requires an
> AccessExclusiveLock, it'll have to wait until the vacuum has
> completed. If you leave the DDL running then all accesses to the table
> will be queued behind the ungranted AccessExclusiveLock.  It's likely
> a good idea to always run DDL with a fairly short lock_timeout, just
> in case this happens.
>
*  How much value I can assign to lock_timeout so that I dont get into
trouble to test my DDL commands and without impacting other sessions.*

>
> >3)  Can I increase the  autovacuum_freeze_max_age on the tables on
> production system ?
>


>
> Yes, but you cannot increase the per-table setting above the global
> setting. Changing the global setting requires a restart.
>
>How can I change the value of the global setting of the
autovacuum_freeze_max_Age value.


> David
>


Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread github kran
Thanks David for your replies.

On Thu, May 7, 2020 at 11:01 PM David Rowley  wrote:

> On Fri, 8 May 2020 at 09:18, github kran  wrote:
> > 1)  We haven't changed anything related to autovacuum except a work_mem
> parameter which was increased to 4 GB which I believe is not related to
> autovacuum
>
> It might want to look into increasing vacuum_cost_limit to something
> well above 200 or dropping autovacuum_vacuum_cost_delay down from 20
> to something much lower. However, you say you've not changed the
> autovacuum settings, but you've also said:
>
> >1)  I see there are 8 Vacuum workers ( Not sure what changed) running
> in the background and the concern I have is all of these vacuum processes
> are running with wrap around and while they are running
>

   - Yes I said it was originally 3 but I noticed  the work_mem parameter
   was changed few weeks back to 4 GB and then from that day onwards there is
   an increasing trend of  the MaxUsedTransactionIds from 200 Million to 347
   million ( It's growing day by day from last 2 -3 weeks)
   - Do you think there could be a formula on how the workers could have
   increased based on this increase in WORK_MEM controlled by database ?.


> The default is 3, so if you have 8 then the settings are non-standard.
>
> It might be good to supply the output of:
>
> SELECT name,setting from pg_Settings where name like '%vacuum%';
>
   Output of vacuum

name setting min_val max_val boot_val reset_val
autovacuum on null null on on
autovacuum_analyze_scale_factor 0.02 0 100 0.1 0.02
autovacuum_analyze_threshold 50 0 2147483647 50 50
autovacuum_freeze_max_age 2 10 20 2 2
autovacuum_max_workers 8 1 262143 3 8
autovacuum_multixact_freeze_max_age 4 1 20 4
4
autovacuum_naptime 5 1 2147483 60 5
autovacuum_vacuum_cost_delay 5 -1 100 20 5
autovacuum_vacuum_cost_limit -1 -1 1 -1 -1
autovacuum_vacuum_scale_factor 0.05 0 100 0.2 0.05
autovacuum_vacuum_threshold 50 0 2147483647 50 50
autovacuum_work_mem -1 -1 2147483647 -1 -1


>
> You should know that the default speed that autovacuum runs at is
> quite slow in 9.6. If you end up with all your autovacuum workers tied
> up with anti-wraparound vacuums then other tables are likely to get
> neglected and that could lead to stale stats or bloated tables. Best
> to aim to get auto-vacuum running faster or aim to perform some manual
> vacuums of tables that are over their max freeze age during an
> off-peak period to make use of the lower load during those times.
> Start with tables in pg_class with the largest age(relfrozenxid).
> You'll still likely want to look at the speed autovacuum runs at
> either way.
>
> Please be aware that the first time a new cluster crosses the
> autovacuum_freeze_max_age threshold can be a bit of a pain point as it
> can mean that many tables require auto-vacuum activity all at once.
> The impact of this is compounded if you have many tables that never
> receive an UPDATE/DELETE as auto-vacuum, in 9.6, does not visit those
> tables for any other reason. After the first time, the relfrozenxids
> of tables tend to be more staggered so their vacuum freeze
> requirements are also more staggered and that tends to cause fewer
> problems.
>

  The current situation I have is the auto vacuum kicked with 8 tables with
each of those tied to each worker and it's running very slow in 9.6 as you
mentioned
   i observed VACUUM  on those 8 tables is running from last 15 hrs and
other process are running for 1 hr+ and others for few minutes for
different tables.

   Finally I would wait for your reply to see what could be done for this
VACUUM and growing TXIDs  values.

   -Do you think I should consider changing back the work_mem back to 4
   MB what it was originally ?
   -   Can I apply your recommendations on a production instance directly
   or you prefer me to apply initially in other environment before applying on
   Prod ?
   -   Also like I said I want to clean up few unused tables OR MANUAL
   VACUUM but current system doesn't allow me to do it considering these
   factors.
   -  I will try to run VACUUM Manually during off peak hrs , Can I STOP
   the Manual VACUUM process if its take more than 10 minutes or what is the
   allowed time in mins I can have it running  ?.

David
>


Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread github kran
On Thu, May 7, 2020 at 4:18 PM github kran  wrote:

>
>
> On Thu, May 7, 2020 at 1:33 PM Michael Lewis  wrote:
>
>> It is trying to do a vacuum freeze. Do you have autovacuum turned off?
>> Any settings changed from default related to autovacuum?
>>
>> https://www.postgresql.org/docs/9.6/routine-vacuuming.html
>> Read 24.1.5. Preventing Transaction ID Wraparound Failures
>>
>> These may also be of help-
>>
>> https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql
>> https://www.2ndquadrant.com/en/blog/managing-freezing/
>>
>> Note that you need to ensure the server gets caught up, or you risk being
>> locked out to prevent data corruption.
>>
>
>   Thanks Mike.
> 1)  We haven't changed anything related to autovacuum except a work_mem
> parameter which was increased to 4 GB which I believe is not related to
> autovacuum
> 2)  The vacuum was not turned off and few parameters we had on vacuum are
>  *autovacuum_analyze_scale_factor = 0.02* and 
> *autovacuum_vacuum_scale_factor
> = 0.05*
> *3) *The database curently we are running is 2 years old for now and we
> have around close to 40 partitions and the *datfrozenxid on the table is
> 343 million whereas the default is 200 million*.  I would try doing a
> manual auto vacuum on those tables
> where the *autovacuum_freeze_max_age > 200 million*. Do you think It's a
> right thing to do ?.
>
> I will also go through this documents.
>

*   Few more things 5/7 - 8:40 PM CDT*
   1)  I see there are *8 Vacuum workers* ( Not sure what changed) running
in the background and the concern I have is all of these vacuum processes
are running with wrap around and while they are running
  I can't either DROP or ALTER any other tables ( REMOVE Inheritance
for any of old tables where the WRITES are not getting written to).* Any of
the ALTER TABLE OR DROP TABLE  DDL's arer not getting exeucted even I
WAITED FOR SEVERAL MINUTES , so I have terminated those queries as I didn't
have luck.*
   2)  T*he VACUUM Process wrap around is running for last 1 day and
several hrs on other tables. *
   3)  *Can I increase the  autovacuum_freeze_max_age on the tables
on production system* ?

>
> Thanks
>



>
>


Re: AutoVacuum and growing transaction XID's

2020-05-07 Thread github kran
On Thu, May 7, 2020 at 1:33 PM Michael Lewis  wrote:

> It is trying to do a vacuum freeze. Do you have autovacuum turned off? Any
> settings changed from default related to autovacuum?
>
> https://www.postgresql.org/docs/9.6/routine-vacuuming.html
> Read 24.1.5. Preventing Transaction ID Wraparound Failures
>
> These may also be of help-
>
> https://info.crunchydata.com/blog/managing-transaction-id-wraparound-in-postgresql
> https://www.2ndquadrant.com/en/blog/managing-freezing/
>
> Note that you need to ensure the server gets caught up, or you risk being
> locked out to prevent data corruption.
>

  Thanks Mike.
1)  We haven't changed anything related to autovacuum except a work_mem
parameter which was increased to 4 GB which I believe is not related to
autovacuum
2)  The vacuum was not turned off and few parameters we had on vacuum are
 *autovacuum_analyze_scale_factor = 0.02* and
*autovacuum_vacuum_scale_factor
= 0.05*
*3) *The database curently we are running is 2 years old for now and we
have around close to 40 partitions and the datfrozenxid on the table is 343
million whereas the default is 200 million.  I would try doing a manual
auto vacuum on those tables
where the autovacuum_freeze_max_age > 200 million. Do you think It's a
right thing to do ?.

I will also go through this documents.

Tahnks


AutoVacuum and growing transaction XID's

2020-05-07 Thread github kran
Hello Team,

We are using a PostgreSQL version -9.6.12 version and from last 4 weeks our
Transaction ID's (XID's) have increased by 195 million to 341 million
transactions.  I see the below from pg_stat_activity from the postGreSQL DB.

1) Viewing the pg_stat-activity  I noticed  that the vacuum query is
running for a runtime interval of few hours to 3-5 days whenever I check
the pg_stat-activity. Is this a common process postgreSQL runs ? I have
noticed this running and show in the pg_stat activity from last few weeks
only. Also the query shows the table name with
(to prevent wrap around) for each of the tables in the vacuum query as
output. What does this mean ?

2) Does it mean I need to run a manual auto vacuum process for these
tables ? as the transaction ids have increased from 195 million to 341
million ?.

What other things I need to check in the database around this ?.

Thanks !!


Re: Need support on tuning at the time of index creation

2020-01-27 Thread github kran
On Mon, Jan 27, 2020 at 5:27 AM Ron  wrote:

> On 1/27/20 5:10 AM, Sandip Pradhan wrote:
> > Dear Sir/Madam,
> >
> > One of our ericsson product used backend db as postgresql 9. We are
> facing
> > following performance issues where we need some support from your side.
> > We are having 10 tables and we are inserting around 150 million to 250
> > million records on each of those tables. After that we need to create 29
> > indexes includung primary key index and other types of indexes.
> Currently
> > it is taking almost 3.5 to 4 hrs.
> >
> > Please let us know how we can tune the perfirmance so that we can
> complete
> > this task in 30-40 mins.
> >
> > Note: We cannot change the db version.
>
> There are seven versions of "postgresql 9", and multiple patch releases of
> each.  Please specify which version you're running.
>
> --
> Angular momentum makes the world go 'round.
>
> *Sandip -  Few questions not sure how your scripts are setup. To speed up
things faster , are you *

   1. *providing work_mem and maintenance_work_mem parameters while
   creating the indexes. ? ( If Not I would highly recommend to set
   maintenance_work_mem to 8 -10 GB if your cluster has enough memory)*


Re: Upgrade PostgreSQL 9.6 to 10.6

2020-01-09 Thread github kran
On Wed, Jan 8, 2020 at 11:03 PM Michael Lewis  wrote:

> On Wed, Jan 8, 2020 at 8:52 PM github kran  wrote:
>
>> You are right on RDS but I believe the problem is on Aurora PostgreSQL
>> where the pglogical throws an error during installation. Are you aware if
>> this works on  Aurora PostGreSQL
>>
>
> It seems like this question should be sent to AWS support for Aurora if
> you are wanting to upgrade an Aurora 9.6 instance to 10x also in Aurora.
>

Yes you are right on that I dont believe in AWS support group but rather I
feel this community has more experience and have more knowledgeable folks.
I reached out to AWS support and got a reply as 'NO' ( This is not
supported from Aurora PostgreSQL)


Re: Upgrade PostgreSQL 9.6 to 10.6

2020-01-08 Thread github kran
On Wed, Jan 8, 2020 at 5:57 PM Christophe Pettus  wrote:

>
>
> > On Jan 8, 2020, at 15:56, github kran  wrote:
> >
> > Great I see its supported based on the link but the problem is we are
> locked into a Aurora RDS and we can't use logical replication on that
> engine.
>
> You can use pglogical on RDS PostgreSQL 9.6.
>
> --
> -- Christophe Pettus
>x...@thebuild.com
>
> You are right on RDS but I believe the problem is on Aurora PostgreSQL
where the pglogical throws an error during installation. Are you aware if
this works on  Aurora PostGreSQL


Re: Upgrade PostgreSQL 9.6 to 10.6

2020-01-08 Thread github kran
Great I see its supported based on the link but the problem is we are
locked into a Aurora RDS and we can't use logical replication on that
engine. Any thoughts that we can use some alternate mechanism to
migrate the delta data from database 1 ( Running 9.6) to database 2 (
Running 10.6 ) after we do the pg_dump and pg_restore ?.

On Wed, Jan 8, 2020 at 10:25 AM Michael Lewis  wrote:

> On Wed, Jan 8, 2020 at 7:20 AM github kran  wrote:
>
>> Sorry for the confusion, I tried to install this extension on 9.6 and it
>> seems to be not working. Does 9.6 PostGreSQL supports logical replication ?
>>
>
>
> No. See the top of this page with supported versions listed. Prior to v10,
> pg_logical extension was often used.
> https://www.postgresql.org/docs/current/logical-replication.html
>


Re: Upgrade PostgreSQL 9.6 to 10.6

2020-01-08 Thread github kran
Sorry for the confusion, I tried to install this extension on 9.6 and it
seems to be not working. Does 9.6 PostGreSQL supports logical replication ?

On Tue, Jan 7, 2020 at 4:56 PM github kran  wrote:

>
> On Tue, Dec 10, 2019 at 2:53 AM Andreas Kretschmer <
> andr...@a-kretschmer.de> wrote:
>
>>
>>
>> Am 09.12.19 um 23:37 schrieb github kran:
>> > Great, thanks Andreas, So this seems to be a good feature using the
>> > core concept of replication. Can I use this extension and do the major
>> > upgrade without paying ?.
>>
>> yes, this extension is free.
>>
>>
>> Regards, Andreas
>>
>
>   Hello Andreas,
>As a follow up on this thread, I want to ask if this extension works on
> postgreSQL version 10.6 ? and also does this extension can be installed on
> Aurora version of PostGreSQL.?.  if it doesnt do you have any
>other alternative to make this work on Aurora engine ?
>
>Thanks !!
>
>
>
>>
>> --
>> 2ndQuadrant - The PostgreSQL Support Company.
>> www.2ndQuadrant.com
>>
>>


Re: Upgrade PostgreSQL 9.6 to 10.6

2020-01-07 Thread github kran
On Tue, Dec 10, 2019 at 2:53 AM Andreas Kretschmer 
wrote:

>
>
> Am 09.12.19 um 23:37 schrieb github kran:
> > Great, thanks Andreas, So this seems to be a good feature using the
> > core concept of replication. Can I use this extension and do the major
> > upgrade without paying ?.
>
> yes, this extension is free.
>
>
> Regards, Andreas
>

  Hello Andreas,
   As a follow up on this thread, I want to ask if this extension works on
postgreSQL version 10.6 ? and also does this extension can be installed on
Aurora version of PostGreSQL.?.  if it doesnt do you have any
   other alternative to make this work on Aurora engine ?

   Thanks !!



>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>


Re: Upgrade PostgreSQL 9.6 to 10.6

2019-12-09 Thread github kran
Great, thanks Andreas, So this seems to be a good feature using the core
concept of replication. Can I use this extension and do the major upgrade
without paying ?.

On Mon, Dec 9, 2019 at 10:02 AM Andreas Kretschmer 
wrote:

>
>
> Am 09.12.19 um 16:43 schrieb github kran:
> > Hello PostgreSQL Team,
> > I would like to know what would be the best way to do Database
> > migration from PostgreSQL 9.6 engine to 10.6 by creating a new cluster
> > in 10.6 and then copy data.
> >
> > Size of the cluster is 3.8 TB.
> >
> > 1) It would be a new cluster we want to create on 10.6 and then copy
> > data form 9.6 and shutdown 9.6
> > 2) Cluster today contains 1 master instance and another instance for
> > reader
> > 3) Can this be done without a downtime ?  what are the options I have
> > to do this without making this complicated . We have about 30 -40
> > tables to do the migration.
> >
> >
>
> yes, you can use pg_logical from us (2ndquadrant). Please visit our
> homepage, there you can find it for 9.6. There are also some
> blogs-postings about all that in our blog.
>
>
> Regards, Andreas
>
> --
> 2ndQuadrant - The PostgreSQL Support Company.
> www.2ndQuadrant.com
>
>
>
>


Upgrade PostgreSQL 9.6 to 10.6

2019-12-09 Thread github kran
Hello PostgreSQL Team,
I would like to know what would be the best way to do Database migration
from PostgreSQL 9.6 engine to 10.6 by creating a new cluster in 10.6 and
then copy data.

Size of the cluster is 3.8 TB.

1) It would be a new cluster we want to create on 10.6 and then copy data
form 9.6 and shutdown 9.6
2) Cluster today contains 1 master instance and another instance for reader
3) Can this be done without a downtime ?  what are the options I have to do
this without making this complicated . We have about 30 -40 tables to do
the migration.

Appreciate your replies.

Thanks,
Kranthi


Re: PostGreSQL Replication and question on maintenance

2019-11-26 Thread github kran
Thanks Jeff for your reply.  We are not using replication concept here but
doing the data copy using few custom sql scripts. We had our PostgreSQL
engine supported by AWS and the instance was restarted automatically based
on the AWS design of PostgreSQL engine. Replica is restarted automatically
when the data sync is behind the master.

My intension is to bring to this forum and see if this is expected based on
the use case what I mentioned. I love this community and advices the
community has for various problems.  i still need to find out how can I do
this using replication. Please provide me few resources if there is a
better solution for this problem.

Thanks
githubkran



On Sat, Nov 16, 2019 at 11:13 AM Jeff Janes  wrote:

> On Thu, Nov 14, 2019 at 12:23 PM github kran  wrote:
>
>>
>>>
>>> *Problem what we have right now. *
>>>
>>> When the migration activity runs(weekly) from past 2 times , we saw the
>>> cluster read replica instance has restarted as it fallen behind the
>>> master(writer instance).
>>>
>>
> I can't figure out what your setup is here.  You must be using logical
> replication (not physical) or you wouldn't be able to write to the replica
> at all.  But if you are using logical replication, why do you also need
> these weekly jobs?  Why isn't logical replication taking care of it?
>
>
>
>
>> Everything
>>>
>>> after that worked seamlessly but we want to avoid the replica getting
>>> restarted. To avoid from restart we started doing smaller binary files and
>>> copy those files to the cluster-2
>>>
>>
> Who restarted it?  I am not aware of any case where the replica responds
> to falling behind by restarting itself.  With physical replication, it can
> start cancelling queries, but you don't seem to be using physical
> replication.
>
> Cheers,
>
> Jeff
>
>>


Re: PostGreSQL Replication and question on maintenance

2019-11-16 Thread github kran
Any reply on this please ?.

On Fri, Nov 15, 2019 at 9:10 AM github kran  wrote:

>
>
> On Thu, Nov 14, 2019 at 11:42 PM Pavel Stehule 
> wrote:
>
>>   these numbers looks crazy high - how much memory has your server - more
>> than 1TB?
>>
>
> The cluster got 244 GB of RAM and storage capacity it has is 64 TB.
>
>>
>>
>> pá 15. 11. 2019 v 6:26 odesílatel github kran 
>> napsal:
>>
>>>
>>> Hello postGreSQL Community ,
>>>>
>>>>
>>>>
>>>> Hope everyone is doing great !!.
>>>>
>>>>
>>>> *Background*
>>>>
>>>> We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for
>>>> our day to day activities to write and read data. We have 2 clusters
>>>> running PostgreSQL engine , one cluster
>>>>
>>>> keeps data up to 60 days and another cluster retains data beyond 1
>>>> year. The data is partitioned close to a week( ~evry 5 days a partition)
>>>> and we have around 5 partitions per month per each table and we have 2
>>>> tables primarily so that will be 10 tables a week. So in the cluster-1 we
>>>> have around  20 partitions and in cluster-2 we have around 160 partitions (
>>>> data from 2018). We also want to keep the data for up to 2 years in the
>>>> cluster-2 to serve the data needs of the customer and so far we reached
>>>> upto 1 year of maintaining this data.
>>>>
>>>>
>>>>
>>>> *Current activity*
>>>>
>>>> We have a custom weekly migration DB script job that moves data from 1
>>>> cluster to another cluster what it does is the below things.
>>>>
>>>> 1) COPY command to copy the data from cluster-1 and split that data
>>>> into binary files
>>>>
>>>> 2) Writing the binary data into the cluster-2 table
>>>>
>>>> 3) Creating indexes after the data is copied.
>>>>
>>>>
>>>>
>>>> *Problem what we have right now. *
>>>>
>>>> When the migration activity runs(weekly) from past 2 times , we saw the
>>>> cluster read replica instance has restarted as it fallen behind the
>>>> master(writer instance). Everything
>>>>
>>>> after that worked seamlessly but we want to avoid the replica getting
>>>> restarted. To avoid from restart we started doing smaller binary files and
>>>> copy those files to the cluster-2
>>>>
>>>> instead of writing 1 big file of 450 million records. We were
>>>> successful in the recent migration as the reader instance didn’t restart
>>>> after we split 1 big file into multiple files to copy the data over but did
>>>> restart after the indexes are created on the new table as it could be write
>>>> intensive.
>>>>
>>>>
>>>>
>>>> *DB parameters set on migration job*
>>>>
>>>> work_mem set to 8 GB  and maintenace_work_mem=32 GB.
>>>>
>>>
>>
>>
>> Indexes per table = 3
>>>>
>>>> total indexes for 2 tables = 5
>>>>
>>>>
>>>>
>>>> *DB size*
>>>>
>>>> Cluster-2 = 8.6 TB
>>>>
>>>> Cluster-1 = 3.6 TB
>>>>
>>>> Peak Table relational rows = 400 - 480 million rows
>>>>
>>>> Average table relational rows = 300 - 350 million rows.
>>>>
>>>> Per table size = 90 -95 GB , per table index size is about 45 GB
>>>>
>>>>
>>>>
>>>> *Questions*
>>>>
>>>> 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow
>>>> down the writes to the cluster , with that the reader instance can sync the
>>>> data slowly ?.
>>>>
>>>> 2) Based on the above use case what are your recommendations to keep
>>>> the data longer up to 2 years ?
>>>>
>>>> 3) What other recommendations you recommend ?.
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Appreciate your replies.
>>>>
>>>> THanks
>>>> githubkran
>>>>
>>>>>


Re: PostGreSQL Replication and question on maintenance

2019-11-15 Thread github kran
On Thu, Nov 14, 2019 at 11:42 PM Pavel Stehule 
wrote:

>   these numbers looks crazy high - how much memory has your server - more
> than 1TB?
>

The cluster got 244 GB of RAM and storage capacity it has is 64 TB.

>
>
> pá 15. 11. 2019 v 6:26 odesílatel github kran 
> napsal:
>
>>
>> Hello postGreSQL Community ,
>>>
>>>
>>>
>>> Hope everyone is doing great !!.
>>>
>>>
>>> *Background*
>>>
>>> We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for
>>> our day to day activities to write and read data. We have 2 clusters
>>> running PostgreSQL engine , one cluster
>>>
>>> keeps data up to 60 days and another cluster retains data beyond 1 year.
>>> The data is partitioned close to a week( ~evry 5 days a partition) and we
>>> have around 5 partitions per month per each table and we have 2 tables
>>> primarily so that will be 10 tables a week. So in the cluster-1 we have
>>> around  20 partitions and in cluster-2 we have around 160 partitions ( data
>>> from 2018). We also want to keep the data for up to 2 years in the
>>> cluster-2 to serve the data needs of the customer and so far we reached
>>> upto 1 year of maintaining this data.
>>>
>>>
>>>
>>> *Current activity*
>>>
>>> We have a custom weekly migration DB script job that moves data from 1
>>> cluster to another cluster what it does is the below things.
>>>
>>> 1) COPY command to copy the data from cluster-1 and split that data into
>>> binary files
>>>
>>> 2) Writing the binary data into the cluster-2 table
>>>
>>> 3) Creating indexes after the data is copied.
>>>
>>>
>>>
>>> *Problem what we have right now. *
>>>
>>> When the migration activity runs(weekly) from past 2 times , we saw the
>>> cluster read replica instance has restarted as it fallen behind the
>>> master(writer instance). Everything
>>>
>>> after that worked seamlessly but we want to avoid the replica getting
>>> restarted. To avoid from restart we started doing smaller binary files and
>>> copy those files to the cluster-2
>>>
>>> instead of writing 1 big file of 450 million records. We were successful
>>> in the recent migration as the reader instance didn’t restart after we
>>> split 1 big file into multiple files to copy the data over but did restart
>>> after the indexes are created on the new table as it could be write
>>> intensive.
>>>
>>>
>>>
>>> *DB parameters set on migration job*
>>>
>>> work_mem set to 8 GB  and maintenace_work_mem=32 GB.
>>>
>>
>
>
> Indexes per table = 3
>>>
>>> total indexes for 2 tables = 5
>>>
>>>
>>>
>>> *DB size*
>>>
>>> Cluster-2 = 8.6 TB
>>>
>>> Cluster-1 = 3.6 TB
>>>
>>> Peak Table relational rows = 400 - 480 million rows
>>>
>>> Average table relational rows = 300 - 350 million rows.
>>>
>>> Per table size = 90 -95 GB , per table index size is about 45 GB
>>>
>>>
>>>
>>> *Questions*
>>>
>>> 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow
>>> down the writes to the cluster , with that the reader instance can sync the
>>> data slowly ?.
>>>
>>> 2) Based on the above use case what are your recommendations to keep the
>>> data longer up to 2 years ?
>>>
>>> 3) What other recommendations you recommend ?.
>>>
>>>
>>>
>>>
>>>
>>> Appreciate your replies.
>>>
>>> THanks
>>> githubkran
>>>
>>>>


Fwd: PostGreSQL Replication and question on maintenance

2019-11-14 Thread github kran
> Hello postGreSQL Community ,
>
>
>
> Hope everyone is doing great !!.
>
>
> *Background*
>
> We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our
> day to day activities to write and read data. We have 2 clusters running
> PostgreSQL engine , one cluster
>
> keeps data up to 60 days and another cluster retains data beyond 1 year.
> The data is partitioned close to a week( ~evry 5 days a partition) and we
> have around 5 partitions per month per each table and we have 2 tables
> primarily so that will be 10 tables a week. So in the cluster-1 we have
> around  20 partitions and in cluster-2 we have around 160 partitions ( data
> from 2018). We also want to keep the data for up to 2 years in the
> cluster-2 to serve the data needs of the customer and so far we reached
> upto 1 year of maintaining this data.
>
>
>
> *Current activity*
>
> We have a custom weekly migration DB script job that moves data from 1
> cluster to another cluster what it does is the below things.
>
> 1) COPY command to copy the data from cluster-1 and split that data into
> binary files
>
> 2) Writing the binary data into the cluster-2 table
>
> 3) Creating indexes after the data is copied.
>
>
>
> *Problem what we have right now. *
>
> When the migration activity runs(weekly) from past 2 times , we saw the
> cluster read replica instance has restarted as it fallen behind the
> master(writer instance). Everything
>
> after that worked seamlessly but we want to avoid the replica getting
> restarted. To avoid from restart we started doing smaller binary files and
> copy those files to the cluster-2
>
> instead of writing 1 big file of 450 million records. We were successful
> in the recent migration as the reader instance didn’t restart after we
> split 1 big file into multiple files to copy the data over but did restart
> after the indexes are created on the new table as it could be write
> intensive.
>
>
>
> *DB parameters set on migration job*
>
> work_mem set to 8 GB  and maintenace_work_mem=32 GB.
>
> Indexes per table = 3
>
> total indexes for 2 tables = 5
>
>
>
> *DB size*
>
> Cluster-2 = 8.6 TB
>
> Cluster-1 = 3.6 TB
>
> Peak Table relational rows = 400 - 480 million rows
>
> Average table relational rows = 300 - 350 million rows.
>
> Per table size = 90 -95 GB , per table index size is about 45 GB
>
>
>
> *Questions*
>
> 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down
> the writes to the cluster , with that the reader instance can sync the data
> slowly ?.
>
> 2) Based on the above use case what are your recommendations to keep the
> data longer up to 2 years ?
>
> 3) What other recommendations you recommend ?.
>
>
>
>
>
> Appreciate your replies.
>
> THanks
> githubkran
>
>>


PostGreSQL Replication and question on maintenance

2019-11-14 Thread github kran
sorry changing the subject line.

On Thu, Nov 14, 2019 at 11:21 AM github kran  wrote:

> Hello Team,
>
>
>
> Hope everyone is doing great !!.
>
>
> *Background*
>
> We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our
> day to day activities to write and read data. We have 2 clusters running
> PostgreSQL engine , one cluster
>
> keeps data up to 60 days and another cluster retains data beyond 1 year.
> The data is partitioned close to a week( ~evry 5 days a partition) and we
> have around 5 partitions per month per each table and we have 2 tables
> primarily so that will be 10 tables a week. So in the cluster-1 we have
> around  20 partitions and in cluster-2 we have around 160 partitions ( data
> from 2018). We also want to keep the data for up to 2 years in the
> cluster-2 to serve the data needs of the customer and so far we reached
> upto 1 year of maintaining this data.
>
>
>
> *Current activity*
>
> We have a custom weekly migration DB script job that moves data from 1
> cluster to another cluster what it does is the below things.
>
> 1) COPY command to copy the data from cluster-1 and split that data into
> binary files
>
> 2) Writing the binary data into the cluster-2 table
>
> 3) Creating indexes after the data is copied.
>
>
>
> *Problem what we have right now. *
>
> When the migration activity runs(weekly) from past 2 times , we saw the
> cluster read replica instance has restarted as it fallen behind the
> master(writer instance). Everything
>
> after that worked seamlessly but we want to avoid the replica getting
> restarted. To avoid from restart we started doing smaller binary files and
> copy those files to the cluster-2
>
> instead of writing 1 big file of 450 million records. We were successful
> in the recent migration as the reader instance didn’t restart after we
> split 1 big file into multiple files to copy the data over but did restart
> after the indexes are created on the new table as it could be write
> intensive.
>
>
>
> *DB parameters set on migration job*
>
> work_mem set to 8 GB  and maintenace_work_mem=32 GB.
>
> Indexes per table = 3
>
> total indexes for 2 tables = 5
>
>
>
> *DB size*
>
> Cluster-2 = 8.6 TB
>
> Cluster-1 = 3.6 TB
>
> Peak Table relational rows = 400 - 480 million rows
>
> Average table relational rows = 300 - 350 million rows.
>
> Per table size = 90 -95 GB , per table index size is about 45 GB
>
>
>
> *Questions*
>
> 1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down
> the writes to the cluster , with that the reader instance can sync the data
> slowly ?.
>
> 2) Based on the above use case what are your recommendations to keep the
> data longer up to 2 years ?
>
> 3) What other recommendations you recommend ?.
>
>
>
>
>
> Appreciate your replies.
>
> THanks
> githubkran
>
>>


Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-11-14 Thread github kran
Hello Team,



Hope everyone is doing great !!.


*Background*

We use PostgreSQL Version 10.6 version and heavily use PostgreSQL for our
day to day activities to write and read data. We have 2 clusters running
PostgreSQL engine , one cluster

keeps data up to 60 days and another cluster retains data beyond 1 year.
The data is partitioned close to a week( ~evry 5 days a partition) and we
have around 5 partitions per month per each table and we have 2 tables
primarily so that will be 10 tables a week. So in the cluster-1 we have
around  20 partitions and in cluster-2 we have around 160 partitions ( data
from 2018). We also want to keep the data for up to 2 years in the
cluster-2 to serve the data needs of the customer and so far we reached
upto 1 year of maintaining this data.



*Current activity*

We have a custom weekly migration DB script job that moves data from 1
cluster to another cluster what it does is the below things.

1) COPY command to copy the data from cluster-1 and split that data into
binary files

2) Writing the binary data into the cluster-2 table

3) Creating indexes after the data is copied.



*Problem what we have right now. *

When the migration activity runs(weekly) from past 2 times , we saw the
cluster read replica instance has restarted as it fallen behind the
master(writer instance). Everything

after that worked seamlessly but we want to avoid the replica getting
restarted. To avoid from restart we started doing smaller binary files and
copy those files to the cluster-2

instead of writing 1 big file of 450 million records. We were successful in
the recent migration as the reader instance didn’t restart after we split 1
big file into multiple files to copy the data over but did restart after
the indexes are created on the new table as it could be write intensive.



*DB parameters set on migration job*

work_mem set to 8 GB  and maintenace_work_mem=32 GB.

Indexes per table = 3

total indexes for 2 tables = 5



*DB size*

Cluster-2 = 8.6 TB

Cluster-1 = 3.6 TB

Peak Table relational rows = 400 - 480 million rows

Average table relational rows = 300 - 350 million rows.

Per table size = 90 -95 GB , per table index size is about 45 GB



*Questions*

1) Can we decrease the maintenace_work_mem to 16 GB and will it slow down
the writes to the cluster , with that the reader instance can sync the data
slowly ?.

2) Based on the above use case what are your recommendations to keep the
data longer up to 2 years ?

3) What other recommendations you recommend ?.





Appreciate your replies.

THanks
githubkran

>


Archival process of partition tables with filtering few rows from tables.

2019-03-28 Thread github kran
Hello Team,

We are using PostgreSQL Version 9.6 and planning to archive our partition
tables containing about 300 - 500 million rows . We have around ~ 50
partition tables to be archived to a new
cold path PostgreSQL database , version 10.6. We have a requirement to
filter few rows before exporting this data from these tables as we dont
want to archive those rows .

 What is a better approach to export and restore these tables ?.

   - Does COPY Command with a filter query to filter few rows using select
   works better ?.
   - pg_dump with filtering these rows ?.
   - Can i able to export my indexes , check constraints , constraints ?.
   - Any other options ?.


Appreciate your replies.

Thanks.


Re: Tools to migrate data from Json files to PostgreSQL DB.

2019-03-11 Thread github kran
Thank you all for your tips. I will try to see if I can make use of these
useful tips using my java application.

On Fri, Mar 8, 2019 at 1:33 AM Tony Shelver  wrote:

> You may want to assess how you want to store and access the data in
> Postgres before deciding on an import strategy.
>
> I have a system with a mix of relational and JSON data.  The data was
> originally sourced in flat file format.
> I wrote a few Python programs to take the data, then format to JSON, which
> I then submitted to pg functions.
> To make life easier, I submitted it as 1 JSON field to be decomposed by
> Postgres into relational data, and another to be loaded straight into pg
> JSON columns.
>
> The functions then either strip out the data from JSON using the PG JSON
> functions and store it relationally, or plug it straight into a JSON data
> element.
>
> Not terribly difficult to do, especially if you are not navigating complex
> JSON structures in pl/pgsql to strip out the data.
>
> Plenty of python JSON examples out there. Less so for PG :)
>
> On Thu, 7 Mar 2019 at 23:21, github kran  wrote:
>
>> Hello PostgreSQL Team,
>>
>> Are there are any tools to migrate data present in the json files ? to
>> the postgreSQL database.
>> We have data in flat files about 2 billion records across multiple
>> files.
>>
>> 1) What is the easiest way I can transfer this data to relational
>> database ?.
>> 2) Any tools I can use ?. and any better ways do it ?
>>
>> Appreciate your help.
>>
>>
>> Thanks
>> Kran,
>>
>


Tools to migrate data from Json files to PostgreSQL DB.

2019-03-07 Thread github kran
Hello PostgreSQL Team,

Are there are any tools to migrate data present in the json files ? to the
postgreSQL database.
We have data in flat files about 2 billion records across multiple files.

1) What is the easiest way I can transfer this data to relational database
?.
2) Any tools I can use ?. and any better ways do it ?

Appreciate your help.


Thanks
Kran,


Re: cannot execute VACUUM during recovery

2019-03-02 Thread github kran
On Wed, Feb 27, 2019 at 12:39 PM Stephen Eilert 
wrote:

> Are you running Vacuum on the slave node? It has to run on the master.
>
> Thanks,
>
> – Stephen
>


 Thanks Stephen for your advice Iam running the VACUUM command from 2 of
the linux instances using the below command. We were running some scripts
from these linux boxes and surprised to see 1 works the other does not so
wondering why does it happen.

  * Command:* *VACUUM VERBOSE myTableTest;*

1)  Running from Linux Instance1
Command Iam running -> psql -h clusterName -U myUserName -d myPostgresDB
There is no .pgpass setup on this linux instance so I had to manually enter
the password here. VACUUM on the table Iam running does not work and throws
an error as  ERROR:  cannot execute VACUUM during recovery
2) Running from Linux Instance2
Command Iam running -> psql -d myPostgresDB -h clusterName -U myUserName
This option had a .pgpass file at the root in this linux instance and I did
not pass any password here, the
*VACUUM on the table Iam running works.*

*pgpass is setup here and contains
-> clusterName:5432:myPostgresDB:myUserName:myDBPassword*

Thanks
- Kran.


> On Feb 27, 2019, 6:43 AM -0800, github kran , wrote:
>
> Hello Team,
>
> We are using a PostgreSQL 9.6 and seeing the below error while trying to
> run a VACUUM on one of our live tables running in Production. We wanted to
> clean up some DEAD tuples on the table.
>
>
> *Command*: VACUUM (ANALYZE,VERBOSE) table_name.
>
> ERROR:  cannot execute VACUUM during recovery
>
> Thanks
> Kran
>
>


cannot execute VACUUM during recovery

2019-02-27 Thread github kran
Hello Team,

We are using a PostgreSQL 9.6 and seeing the below error while trying to
run a VACUUM on one of our live tables running in Production. We wanted to
clean up some DEAD tuples on the table.


*Command*: VACUUM (ANALYZE,VERBOSE) table_name.

ERROR:  cannot execute VACUUM during recovery

Thanks
Kranthi


PostgreSQL DB Maintenance and Partitioning to keep data longer.

2019-02-25 Thread github kran
Hello pgsql-General Team,



We have currently using Postgresql DB which is growing about 1.4 billion
records/month with a total of about 16 to 17 billion records/year. The DB
storage is growing about 6.8 TB/year including all indexes and data.


Current total DB Storage we got is 60 TB.



*Use case   *

We wanted to move the partitioned data on a monthly basis older than 90
days to a newer historical database from the existing database (which is
another PostgreSQL DB) so that we can build a REST API to access the data
for the clients.  Intention is to keep this data for > 10 years and separate

out the DB data into hot and cold using separate databases.



*Questions.*

1) Can PostgreSQL can handle this volume with good performance if we want
to use the database for next 10 years ? (I know this depends lot of other
factors like our index bloat sizes and size of the record and partition
strategy)

2) Do we have any similar uses cases of this volume and storage ? any
information would be helpful.

3) How many partitions we can have considering better IO, query response
times. ?

4) We want to merge fewer partitions which are based on weekly to 1 month
partitions. Can we merge the partitions and use them in the newer database.

5) Can we use export and import partitions into the newer database ?.



Any information/resources/design ideas would be helpful.



Thanks


Re: How many billion rows of data I can store in PostgreSQL RDS.

2019-02-22 Thread github kran
On Fri, Feb 22, 2019 at 5:48 AM Samuel Teixeira Santos 
wrote:

> Just adding that my case it's not a Amazon RDS, it's common server, if I
> can say like that...
>
Aplologies I missed the point to mention that this is a question to
PostgreSQL community. We are currently using PostgreSQL. (  Aurora Postgres
RDS). I want to know what postgresql can handle in terms of limitations.

Thanks.


How many billion rows of data I can store in PostgreSQL RDS.

2019-02-21 Thread github kran
Hello Pgsql-General,

We have currently have around 6 TB of data and have plans to move some
historic datainto RDS of about close to 1 TB of data. The total rows in
partitioned tables is around 6 billion rows today and have plans to keep
the data long term which would be around 5-8 billion rows per year.

So i short my use case is to keep data of 8 billion rows of data every year
and store atleast 16 billion of rows for every 2 years.

   1. How many billion rows does RDS handle ?. This data would be exposed
   by AP's accessing this data.

Appreciate you reply on this.
Thank you.


Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-21 Thread github kran
Thanks for the feedback.

On Tue, Feb 19, 2019 at 11:12 AM Michael Lewis  wrote:

> 1) You can increase it as much as you want but (auto)analyze will take
> longer to examine the values of default_stat_target * 300 rows and compute
> the most common values and the frequencies of those values. How much
> variation does you data actually have? If your data only has 50 distinct
> values with fairly even distribution, then no need to increase it from 100
> even. Oh, deciding on the best query plan will take a little more time for
> the optimizer since it will be examining bigger collection of stats on the
> tables that you have increased.
>
> 2) I am not aware.
>
> 3) I am not aware of anything about your application so I can't recommend
> any number outright, but 4MB for work_mem definitely seems low to me
> assuming you have 16GB or more memory available unless you have very high
> concurrency. It will depend on how many sorts per statement, how many
> users, etc. If you spill over to disk on routine operations, then things
> are definitely going to be much slower than if you are able to keep things
> in memory. You could try running explain analyze and just verify that you
> are keeping things in memory. You could also turn on automatic gathering of
> explain analyze plans on live if you have the room for logging and can
> tolerate just a little latency.
>
> https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server
>
> https://www.postgresql.org/docs/current/runtime-config-resource.html#GUC-WORK-MEM
>
> I'm glad your performance is improved in AWS/dev environment. It can be a
> big hassle to test things in an environment that performs significantly
> different.
>
>
> *Michael Lewis*
>
>
> On Sun, Feb 17, 2019 at 10:01 AM github kran  wrote:
>
>>
>>
>> On Thu, Feb 14, 2019 at 4:58 PM Michael Lewis  wrote:
>>
>>> This is beyond my expertise except to say that if your storage is SSDs
>>> in AWS, then you definitely want random_page_cost close to the same as
>>> seq_page_cost (1 by default) assuming your data is likely to be in cache as
>>> discussed in the documentation
>>> <https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS>.
>>> As it says- "Reducing this value relative to seq_page_cost will cause the
>>> system to prefer index scans" as you saw. Changing the value on production
>>> would again depend on the storage type used, and how good the cache hit
>>> rate is.
>>>
>>> As far as I know, dropping old partitions should not be significantly
>>> impactful to the system other than no longer needing to store that data
>>> (cost, time for full backups, etc).
>>>
>>> Again, as I understand things, there is not a big impact from having old
>>> unused tables in terms of maintenance. They should be ignored by normal
>>> processes.
>>>
>>> Glad you got your issue resolved.
>>>
>>>
>>> *Michael Lewis*
>>>
>>
>>  Thanks for the feedback.You have been giving your
>> thoughts/suggestions since the beginning of the case. It was helpful.  I
>> think I realized later based on your suggestion to increase the default
>> statistics target from 100. It was not correctly initially
>>  as I had that set at session level without setting them on the
>> partition tables. As next steps I have the stats to 1000 on all of the
>> partitiion tables we use from Nov 2018 to Feb 19 and then did ANALYZE. Iam
>> currently running a load test to test
>> how the DB performance is behaving right now and so far its running
>> good than before. ( I have reset all the previous changes done except the
>> statistics change).  I will keep you posted after the test finishes
>>
>> Questions.
>> 1)  Can i further increase the Setting to 3000 and see the system
>> behaves. ?. How do I know the best value to be used for my database in
>> terms of the sampling limit with the default statistics setting ?.
>>  2) Apart from analyzing the tables do I need to do any other
>> changes  with the statistics setting ?
>>  3)  Also the current work mem is set to 4 MB and we didnt play with
>> this value so far. For future needs can I increase the WORK MEM setting ?.
>>
>> Appreciate your reply.
>>
>> Thanks
>>
>>
>>> On Thu, Feb 14, 2019 at 3:11 PM github kran 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis 
>>>> wrote:
>>>>
>>>>&g

Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-17 Thread github kran
On Thu, Feb 14, 2019 at 4:58 PM Michael Lewis  wrote:

> This is beyond my expertise except to say that if your storage is SSDs in
> AWS, then you definitely want random_page_cost close to the same as
> seq_page_cost (1 by default) assuming your data is likely to be in cache as
> discussed in the documentation
> <https://www.postgresql.org/docs/current/runtime-config-query.html#RUNTIME-CONFIG-QUERY-CONSTANTS>.
> As it says- "Reducing this value relative to seq_page_cost will cause the
> system to prefer index scans" as you saw. Changing the value on production
> would again depend on the storage type used, and how good the cache hit
> rate is.
>
> As far as I know, dropping old partitions should not be significantly
> impactful to the system other than no longer needing to store that data
> (cost, time for full backups, etc).
>
> Again, as I understand things, there is not a big impact from having old
> unused tables in terms of maintenance. They should be ignored by normal
> processes.
>
> Glad you got your issue resolved.
>
>
> *Michael Lewis*
>

 Thanks for the feedback.You have been giving your thoughts/suggestions
since the beginning of the case. It was helpful.  I think I realized later
based on your suggestion to increase the default statistics target from
100. It was not correctly initially
 as I had that set at session level without setting them on the
partition tables. As next steps I have the stats to 1000 on all of the
partitiion tables we use from Nov 2018 to Feb 19 and then did ANALYZE. Iam
currently running a load test to test
how the DB performance is behaving right now and so far its running
good than before. ( I have reset all the previous changes done except the
statistics change).  I will keep you posted after the test finishes

Questions.
1)  Can i further increase the Setting to 3000 and see the system
behaves. ?. How do I know the best value to be used for my database in
terms of the sampling limit with the default statistics setting ?.
 2) Apart from analyzing the tables do I need to do any other changes
with the statistics setting ?
 3)  Also the current work mem is set to 4 MB and we didnt play with
this value so far. For future needs can I increase the WORK MEM setting ?.

Appreciate your reply.

Thanks


> On Thu, Feb 14, 2019 at 3:11 PM github kran  wrote:
>
>>
>>
>> On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis 
>> wrote:
>>
>>> How many total rows in these tables? I am assuming these are partitions
>>> and those 100 million rows got distributed. If the data difference is
>>> significant, then you aren't guaranteed similar performance. You may want
>>> to follow more of the suggested steps on.
>>>
>>> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>>>
>>>
>>> *Michael Lewis  |  Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>> Michael - Yes correct the data of 100 million rows is distributed to all
>> the partitions.
>> FInally I feel we have come to conclusion after we changed the
>> random_page_cost from 4 to 2 in Non prod instance and we see improvements
>> in the query which use to take from 1 minute to 1 -2 seconds.
>> That's correct we have around 490 million rows in few of our partition
>> tables. The partition tables are created for every 7 days.
>>
>> We ran our API test which hits the backend database Aurora RDS PostgreSQL
>> and see our query response times , requests/sec are better than before. Do
>> you recommend this setting on a Production instance? Right now we are
>> planning to go
>> implement this option of random_page_cost to 2. ( We know it also works
>> if we turn off the hash_join but we dont want to implement this change but
>> rather use random_page_cost to 2).
>>
>> Questions.
>> 1) What is your recommendation on this ? Can we modify this change on
>> Prod instance which is performing better today or only keep this change to
>> Non prod instance ?. ( Actually we want to implement this change on Non
>> Prod instance)
>> 2) Are there are any downside in changing this value ?.
>> 3) Also another question we have a 64TB storage and running on r4 16 x
>> large EC2 instance. Few of our partitions are bigger and few of them are
>> smaller. We have data from last  9 months and we are planning to keep the
>> data for about
>> close to 1 year till May. Do you see any concerns on this ?. Eventually
>> we are thinking to archive this data in next 2 months by dropping of older
>> partitions.
>> 3) What could be the problems of keeping the data longer if there is a
>> regula

Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-14 Thread github kran
On Thu, Feb 14, 2019 at 12:43 PM Michael Lewis  wrote:

> How many total rows in these tables? I am assuming these are partitions
> and those 100 million rows got distributed. If the data difference is
> significant, then you aren't guaranteed similar performance. You may want
> to follow more of the suggested steps on.
>
> https://wiki.postgresql.org/wiki/SlowQueryQuestions
>
>
> *Michael Lewis  |  Software Engineer*
> *Entrata*
> *c: **619.370.8697 <619-370-8697>*
>
> Michael - Yes correct the data of 100 million rows is distributed to all
the partitions.
FInally I feel we have come to conclusion after we changed the
random_page_cost from 4 to 2 in Non prod instance and we see improvements
in the query which use to take from 1 minute to 1 -2 seconds.
That's correct we have around 490 million rows in few of our partition
tables. The partition tables are created for every 7 days.

We ran our API test which hits the backend database Aurora RDS PostgreSQL
and see our query response times , requests/sec are better than before. Do
you recommend this setting on a Production instance? Right now we are
planning to go
implement this option of random_page_cost to 2. ( We know it also works if
we turn off the hash_join but we dont want to implement this change but
rather use random_page_cost to 2).

Questions.
1) What is your recommendation on this ? Can we modify this change on Prod
instance which is performing better today or only keep this change to Non
prod instance ?. ( Actually we want to implement this change on Non Prod
instance)
2) Are there are any downside in changing this value ?.
3) Also another question we have a 64TB storage and running on r4 16 x
large EC2 instance. Few of our partitions are bigger and few of them are
smaller. We have data from last  9 months and we are planning to keep the
data for about
close to 1 year till May. Do you see any concerns on this ?. Eventually we
are thinking to archive this data in next 2 months by dropping of older
partitions.
3) What could be the problems of keeping the data longer if there is a
regular maintenance like  VACUUM and other maintenace activities



>
> On Thu, Feb 14, 2019 at 8:48 AM github kran  wrote:
>
>>
>>
>> On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis 
>> wrote:
>>
>>> I didn't see your email yesterday, sorry about that. Index scans instead
>>> of sequential scans and nested loop instead of hash join means that you
>>> have bad row count estimates on "Non prod Aurora RDS instance" as far as I
>>> can figure. Have you run commands like-
>>>
>>> analyze asset_info_2019_2_part4;
>>> analyze asset_info_2019_2_part2;
>>>
>>> etc? If data are very similar, indexes all exist, and
>>> default_statistics_target are the same, then you should be getting the same
>>> plans.
>>>
>>>
>>> *Michael Lewis  |  Software Engineer*
>>> *Entrata*
>>> *c: **619.370.8697 <619-370-8697>*
>>>
>>> 1)  Yes did the VACUUM for all the tables like asset_info_2019_2_part1,
>> part2 , part4 and also for location_info_xxx  to remove any dead tuples and
>> also rebuilt the indexes..
>> 2)   REINDEX table  location_data_2018_12_part4;( Like wise ran for all
>> the tables and also VACUUM for all the tables).
>> 3)  The data in Non prod instance is more.  One thing to mention here
>> when we built the Non prod instance we copied SNAPSHOT from Prod instance
>> and on top of that inserted data about 100 million rows and then did VACUUM
>> and re-indexed the tables.
>>
>> I cant think of anything we can do here but let us know if you need any
>> more details on this problem. Iam happy to share more details.
>>
>>
>>>
>>> On Wed, Feb 13, 2019 at 8:49 AM github kran 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 12:55 PM github kran 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis 
>>>>> wrote:
>>>>>
>>>>>> Did you update the stats by running ANALYZE on the tables involved,
>>>>>> or perhaps the entire database on the 'Non prod Aurora RDS instance'? Can
>>>>>> you share the two execution plans?
>>>>>>
>>>>>> *Michael Lewis  |  Software Engineer*
>>>>>> *Entrata*
>>>>>> *c: **619.370.8697 <619-370-8697>*
>>>>>>
>>>>>
>>>>> Here is the plan for both of the DB instances.
>>>>>
>>>>
>>>> Michael - 

Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-14 Thread github kran
On Wed, Feb 13, 2019 at 11:38 AM Michael Lewis  wrote:

> I didn't see your email yesterday, sorry about that. Index scans instead
> of sequential scans and nested loop instead of hash join means that you
> have bad row count estimates on "Non prod Aurora RDS instance" as far as I
> can figure. Have you run commands like-
>
> analyze asset_info_2019_2_part4;
> analyze asset_info_2019_2_part2;
>
> etc? If data are very similar, indexes all exist, and
> default_statistics_target are the same, then you should be getting the same
> plans.
>
>
> *Michael Lewis  |  Software Engineer*
> *Entrata*
> *c: **619.370.8697 <619-370-8697>*
>
> 1)  Yes did the VACUUM for all the tables like asset_info_2019_2_part1,
part2 , part4 and also for location_info_xxx  to remove any dead tuples and
also rebuilt the indexes..
2)   REINDEX table  location_data_2018_12_part4;( Like wise ran for all the
tables and also VACUUM for all the tables).
3)  The data in Non prod instance is more.  One thing to mention here when
we built the Non prod instance we copied SNAPSHOT from Prod instance and on
top of that inserted data about 100 million rows and then did VACUUM and
re-indexed the tables.

I cant think of anything we can do here but let us know if you need any
more details on this problem. Iam happy to share more details.


>
> On Wed, Feb 13, 2019 at 8:49 AM github kran  wrote:
>
>>
>>
>> On Tue, Feb 12, 2019 at 12:55 PM github kran 
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis 
>>> wrote:
>>>
>>>> Did you update the stats by running ANALYZE on the tables involved, or
>>>> perhaps the entire database on the 'Non prod Aurora RDS instance'? Can you
>>>> share the two execution plans?
>>>>
>>>> *Michael Lewis  |  Software Engineer*
>>>> *Entrata*
>>>> *c: **619.370.8697 <619-370-8697>*
>>>>
>>>
>>> Here is the plan for both of the DB instances.
>>>
>>
>> Michael - Did you get any chance to look at this issue. Also we see there
>> is a sequential scan being done instead of index scan.
>>
>>>
>>>>
>>>> On Tue, Feb 12, 2019 at 11:27 AM github kran 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Tue, Feb 12, 2019 at 7:07 AM github kran 
>>>>> wrote:
>>>>>
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran 
>>>>>> wrote:
>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis 
>>>>>>> wrote:
>>>>>>>
>>>>>>>> Are default statistics target the same on both prod and AWS? Have
>>>>>>>> you analyzed all tables being used in this query to ensure stats are up
>>>>>>>> proper? If the optimizer is choosing a different plan, then the stats 
>>>>>>>> must
>>>>>>>> be different IMO.
>>>>>>>>
>>>>>>>>
>>>>>>>> *Michael Lewis |  Software Engineer*
>>>>>>>> *Entrata*
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> Thanks for your reply  I have verified few of the tables and their
>>>>>>> default statistics target and they seem to be same but is there 
>>>>>>> anything in
>>>>>>> particular you want me to look at it to differentiate Prod and Non prod
>>>>>>> databases ?. ( Also the DB instance size is same but there is little
>>>>>>> more data in the Non prod Aurora RDS instance compared to Prod
>>>>>>> instance).
>>>>>>>
>>>>>>> Query used.
>>>>>>>>select * from pg_stats where tablename = 'tableName'
>>>>>>>>
>>>>>>>>
>>>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran 
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>>> Hi Postgres Team,
>>>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently
>>>>>>>>> deleted few million rows from the database and ran into a issue in 
>>>>>>>>> one of
>>>>>>>>> our dev account where the
>

Re: Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-13 Thread github kran
On Tue, Feb 12, 2019 at 12:55 PM github kran  wrote:

>
>
> On Tue, Feb 12, 2019 at 12:33 PM Michael Lewis  wrote:
>
>> Did you update the stats by running ANALYZE on the tables involved, or
>> perhaps the entire database on the 'Non prod Aurora RDS instance'? Can you
>> share the two execution plans?
>>
>> *Michael Lewis  |  Software Engineer*
>> *Entrata*
>> *c: **619.370.8697 <619-370-8697>*
>>
>
> Here is the plan for both of the DB instances.
>

Michael - Did you get any chance to look at this issue. Also we see there
is a sequential scan being done instead of index scan.

>
>>
>> On Tue, Feb 12, 2019 at 11:27 AM github kran 
>> wrote:
>>
>>>
>>>
>>> On Tue, Feb 12, 2019 at 7:07 AM github kran 
>>> wrote:
>>>
>>>>
>>>>
>>>> On Mon, Feb 11, 2019 at 6:00 PM github kran 
>>>> wrote:
>>>>
>>>>>
>>>>>
>>>>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis 
>>>>> wrote:
>>>>>
>>>>>> Are default statistics target the same on both prod and AWS? Have you
>>>>>> analyzed all tables being used in this query to ensure stats are up 
>>>>>> proper?
>>>>>> If the optimizer is choosing a different plan, then the stats must be
>>>>>> different IMO.
>>>>>>
>>>>>>
>>>>>> *Michael Lewis |  Software Engineer*
>>>>>> *Entrata*
>>>>>>
>>>>>
>>>>>
>>>>> Thanks for your reply  I have verified few of the tables and their
>>>>> default statistics target and they seem to be same but is there anything 
>>>>> in
>>>>> particular you want me to look at it to differentiate Prod and Non prod
>>>>> databases ?. ( Also the DB instance size is same but there is little
>>>>> more data in the Non prod Aurora RDS instance compared to Prod
>>>>> instance).
>>>>>
>>>>> Query used.
>>>>>>select * from pg_stats where tablename = 'tableName'
>>>>>>
>>>>>>
>>>>>> On Mon, Feb 11, 2019 at 2:15 PM github kran 
>>>>>> wrote:
>>>>>>
>>>>>>> Hi Postgres Team,
>>>>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently
>>>>>>> deleted few million rows from the database and ran into a issue in one 
>>>>>>> of
>>>>>>> our dev account where the
>>>>>>> DB was not normal after this deletion. We did re index, vacuuming
>>>>>>> entire database but we couldnt bring it to the same state as earlier. So
>>>>>>> next steps we deleted the database and
>>>>>>> recreated the database by copying the snapshot from a production
>>>>>>> instance. Further did vacumming, re-index on the database.
>>>>>>>
>>>>>>> After this now the dev database seems to be in a better state than
>>>>>>> earlier but we are seeing few of our DB calls are taking more than 1 
>>>>>>> minute
>>>>>>> when we are fetching data and we observed
>>>>>>> this is because the query plan was executing a hash join as part of
>>>>>>> the query whereas a similar query on prod instance is not doing any hash
>>>>>>> join and is returning faster.
>>>>>>>
>>>>>>> Also we did not want to experiment by modifing the DB settings by
>>>>>>> doing enable_hash_join to off or random_page_count to 1 as we dont have
>>>>>>> these settings in Prod instance.
>>>>>>>
>>>>>>> Note:
>>>>>>> The partition table sizes we have here is between 40 GB to 75 GB and
>>>>>>> this is our normal size range, we have a new partition table for every 7
>>>>>>> days.
>>>>>>>
>>>>>>> Appreciate your ideas on what we could be missing and what we can
>>>>>>> correct here to reduce the query latency.
>>>>>>>
>>>>>>> Thanks
>>>>>>> githubKran
>>>>>>>
>>>>>>


Postgresql RDS DB Latency Chossing Hash join Plan

2019-02-12 Thread github kran
On Tue, Feb 12, 2019 at 7:07 AM github kran  wrote:

>
>
> On Mon, Feb 11, 2019 at 6:00 PM github kran  wrote:
>
>>
>>
>> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis  wrote:
>>
>>> Are default statistics target the same on both prod and AWS? Have you
>>> analyzed all tables being used in this query to ensure stats are up proper?
>>> If the optimizer is choosing a different plan, then the stats must be
>>> different IMO.
>>>
>>>
>>> *Michael Lewis |  Software Engineer*
>>> *Entrata*
>>>
>>
>>
>> Thanks for your reply  I have verified few of the tables and their
>> default statistics target and they seem to be same but is there anything in
>> particular you want me to look at it to differentiate Prod and Non prod
>> databases ?. ( Also the DB instance size is same but there is little
>> more data in the Non prod Aurora RDS instance compared to Prod instance).
>>
>> Query used.
>>>select * from pg_stats where tablename = 'tableName'
>>>
>>>
>>> On Mon, Feb 11, 2019 at 2:15 PM github kran 
>>> wrote:
>>>
>>>> Hi Postgres Team,
>>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently
>>>> deleted few million rows from the database and ran into a issue in one of
>>>> our dev account where the
>>>> DB was not normal after this deletion. We did re index, vacuuming
>>>> entire database but we couldnt bring it to the same state as earlier. So
>>>> next steps we deleted the database and
>>>> recreated the database by copying the snapshot from a production
>>>> instance. Further did vacumming, re-index on the database.
>>>>
>>>> After this now the dev database seems to be in a better state than
>>>> earlier but we are seeing few of our DB calls are taking more than 1 minute
>>>> when we are fetching data and we observed
>>>> this is because the query plan was executing a hash join as part of the
>>>> query whereas a similar query on prod instance is not doing any hash join
>>>> and is returning faster.
>>>>
>>>> Also we did not want to experiment by modifing the DB settings by doing
>>>> enable_hash_join to off or random_page_count to 1 as we dont have these
>>>> settings in Prod instance.
>>>>
>>>> Note:
>>>> The partition table sizes we have here is between 40 GB to 75 GB and
>>>> this is our normal size range, we have a new partition table for every 7
>>>> days.
>>>>
>>>> Appreciate your ideas on what we could be missing and what we can
>>>> correct here to reduce the query latency.
>>>>
>>>> Thanks
>>>> githubKran
>>>>
>>>


Re: Aurora Postgresql RDS DB Latency

2019-02-12 Thread github kran
On Mon, Feb 11, 2019 at 6:00 PM github kran  wrote:

>
>
> On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis  wrote:
>
>> Are default statistics target the same on both prod and AWS? Have you
>> analyzed all tables being used in this query to ensure stats are up proper?
>> If the optimizer is choosing a different plan, then the stats must be
>> different IMO.
>>
>>
>> *Michael Lewis |  Software Engineer*
>> *Entrata*
>>
>
>
> Thanks for your reply  I have verified few of the tables and their default
> statistics target and they seem to be same but is there anything in
> particular you want me to look at it to differentiate Prod and Non prod
> databases ?. ( Also the DB instance size is same but there is little
> more data in the Non prod Aurora RDS instance compared to Prod instance).
>
> Query used.
>>select * from pg_stats where tablename = 'tableName'
>>
>>
>> On Mon, Feb 11, 2019 at 2:15 PM github kran  wrote:
>>
>>> Hi Postgres Team,
>>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently
>>> deleted few million rows from the database and ran into a issue in one of
>>> our dev account where the
>>> DB was not normal after this deletion. We did re index, vacuuming entire
>>> database but we couldnt bring it to the same state as earlier. So next
>>> steps we deleted the database and
>>> recreated the database by copying the snapshot from a production
>>> instance. Further did vacumming, re-index on the database.
>>>
>>> After this now the dev database seems to be in a better state than
>>> earlier but we are seeing few of our DB calls are taking more than 1 minute
>>> when we are fetching data and we observed
>>> this is because the query plan was executing a hash join as part of the
>>> query whereas a similar query on prod instance is not doing any hash join
>>> and is returning faster.
>>>
>>> Also we did not want to experiment by modifing the DB settings by doing
>>> enable_hash_join to off or random_page_count to 1 as we dont have these
>>> settings in Prod instance.
>>>
>>> Note:
>>> The partition table sizes we have here is between 40 GB to 75 GB and
>>> this is our normal size range, we have a new partition table for every 7
>>> days.
>>>
>>> Appreciate your ideas on what we could be missing and what we can
>>> correct here to reduce the query latency.
>>>
>>> Thanks
>>> githubKran
>>>
>>


Re: Aurora Postgresql RDS DB Latency

2019-02-11 Thread github kran
On Mon, Feb 11, 2019 at 3:29 PM Michael Lewis  wrote:

> Are default statistics target the same on both prod and AWS? Have you
> analyzed all tables being used in this query to ensure stats are up proper?
> If the optimizer is choosing a different plan, then the stats must be
> different IMO.
>
>
> *Michael Lewis |  Software Engineer*
> *Entrata*
>


Thanks for your reply  I have verified few of the tables and their default
statistics target and they seem to be same but is there anything in
particular you want me to look at it to differentiate Prod and Non prod
databases ?. ( Also the DB instance size is same but there is little
more data in the Non prod Aurora RDS instance compared to Prod instance).

Query used.
>select * from pg_stats where tablename = 'tableName'
>
>
> On Mon, Feb 11, 2019 at 2:15 PM github kran  wrote:
>
>> Hi Postgres Team,
>> We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently deleted
>> few million rows from the database and ran into a issue in one of our dev
>> account where the
>> DB was not normal after this deletion. We did re index, vacuuming entire
>> database but we couldnt bring it to the same state as earlier. So next
>> steps we deleted the database and
>> recreated the database by copying the snapshot from a production
>> instance. Further did vacumming, re-index on the database.
>>
>> After this now the dev database seems to be in a better state than
>> earlier but we are seeing few of our DB calls are taking more than 1 minute
>> when we are fetching data and we observed
>> this is because the query plan was executing a hash join as part of the
>> query whereas a similar query on prod instance is not doing any hash join
>> and is returning faster.
>>
>> Also we did not want to experiment by modifing the DB settings by doing
>> enable_hash_join to off or random_page_count to 1 as we dont have these
>> settings in Prod instance.
>>
>> Note:
>> The partition table sizes we have here is between 40 GB to 75 GB and this
>> is our normal size range, we have a new partition table for every 7 days.
>>
>> Appreciate your ideas on what we could be missing and what we can correct
>> here to reduce the query latency.
>>
>> Thanks
>> githubKran
>>
>


Aurora Postgresql RDS DB Latency

2019-02-11 Thread github kran
Hi Postgres Team,
We are using AWS Aurora RDS Postgres DB 9.6.8 Engine. We recently deleted
few million rows from the database and ran into a issue in one of our dev
account where the
DB was not normal after this deletion. We did re index, vacuuming entire
database but we couldnt bring it to the same state as earlier. So next
steps we deleted the database and
recreated the database by copying the snapshot from a production instance.
Further did vacumming, re-index on the database.

After this now the dev database seems to be in a better state than earlier
but we are seeing few of our DB calls are taking more than 1 minute when we
are fetching data and we observed
this is because the query plan was executing a hash join as part of the
query whereas a similar query on prod instance is not doing any hash join
and is returning faster.

Also we did not want to experiment by modifing the DB settings by doing
enable_hash_join to off or random_page_count to 1 as we dont have these
settings in Prod instance.

Note:
The partition table sizes we have here is between 40 GB to 75 GB and this
is our normal size range, we have a new partition table for every 7 days.

Appreciate your ideas on what we could be missing and what we can correct
here to reduce the query latency.

Thanks
githubKran


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
On Wed, Jan 9, 2019 at 12:36 PM Mark Fletcher  wrote:

> On Wed, Jan 9, 2019 at 10:10 AM github kran  wrote:
>
>> Mark - We are currently on 9.6 version of postgres and cant use this
>> feature of logical replication.Answering to your question we are looking
>> for any changes in the data related to a specific table ( changes like any
>> update on a timestamp field
>> OR any new inserts happened in the last 5 seconds for a specific product
>> entity).
>> Any other alternatives ?.
>>
>> The feature was added in 9.4 (I think). We are on 9.6 and it works great.
> Not sure about RDS Aurora, however.
>
> Mark
>

Mark - just curious to know on the logical replication. Do you think I can
use it for my use case where i need to publish data to a subscriber when
there is a change in the data updated for a row or any new inserts
happening on the table. Intention
is to send this data in Json format by collecting this modified data in
real time to a subscriber.

Tahanks
Kran


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
On Wed, Jan 9, 2019 at 12:26 PM Rob Sargent  wrote:

>
>
> On Jan 9, 2019, at 11:11 AM, github kran  wrote:
>
> Rob -   It's a Java based application. We dont have triggers yet on the
> table and is trigger a only option in 9.6 version ?.
>
> On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent  wrote:
>
>>
>> On 1/9/19 10:21 AM, github kran wrote:
>>
>> Thanks for your reply Rob. Reading the below documentation link says the
>> EVENT trigger is only supported for DDL commands. Is it not correct ?.
>>
>> *1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
>> <https://www.postgresql.org/docs/9.6/event-trigger-definition.html> *
>> (An event trigger fires whenever the event with which it is associated
>> occurs in the database in which it is defined. Currently, the only
>> supported events are ddl_command_start, ddl_command_end, table_rewrite
>>  and sql_drop. Support for additional events may be added in future
>> releases.).
>> 2) Doesnt the trigger slow down inserts/update we are doing to the table
>> ?. Does it slow down if we are reading the data using the API when we have
>> a trigger in place ?.
>>
>>
>> Ah, right you are.  Are triggers off the table?  You would want to write
>> the trigger function in some (trusted?) language with access to the outside
>>
>
> (Custom here is to “bottom post”)
>
> Have you tried triggers and found them to have too much impact on total
> system?  I can’t see them being more expensive than looking for changes
> every 5 seconds.  If your hardware can scan 1T that quickly then I suspect
> your trigger will not be noticed.  I would have the trigger write to queue
> and have something else using the queue to talk to IOT piece.
>
> Failing that, perhaps your java (server-side?) app making the changes can
> be taught to emit the necessary details to IOT-thingy?
>

*Sure will try with a trigger and send data to a Queue and gather data for
every 5 seconds reading off from the queue and send to IOT topic. Already
we have a queue where every message inserted or updated on the table is
sent to a queue but **that is lot of data we are gathering. We want to
rather minimize collecting data from DB.*


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
Rob -   It's a Java based application. We dont have triggers yet on the
table and is trigger a only option in 9.6 version ?.

On Wed, Jan 9, 2019 at 12:01 PM Rob Sargent  wrote:

>
> On 1/9/19 10:21 AM, github kran wrote:
>
> Thanks for your reply Rob. Reading the below documentation link says the
> EVENT trigger is only supported for DDL commands. Is it not correct ?.
>
> *1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
> <https://www.postgresql.org/docs/9.6/event-trigger-definition.html> *
> (An event trigger fires whenever the event with which it is associated
> occurs in the database in which it is defined. Currently, the only
> supported events are ddl_command_start, ddl_command_end, table_rewrite
>  and sql_drop. Support for additional events may be added in future
> releases.).
> 2) Doesnt the trigger slow down inserts/update we are doing to the table
> ?. Does it slow down if we are reading the data using the API when we have
> a trigger in place ?.
>
>
> Ah, right you are.  Are triggers off the table?  You would want to write
> the trigger function in some (trusted?) language with access to the outside
>


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
Mark - We are currently on 9.6 version of postgres and cant use this
feature of logical replication.Answering to your question we are looking
for any changes in the data related to a specific table ( changes like any
update on a timestamp field
OR any new inserts happened in the last 5 seconds for a specific product
entity).
Any other alternatives ?.

On Wed, Jan 9, 2019 at 11:24 AM Mark Fletcher  wrote:

> On Wed, Jan 9, 2019 at 9:02 AM github kran  wrote:
>
>>
>>> Hi Postgres Team,
>>>
>>> I have an application using RDS Aurora Postgresql 9.6 version having 4
>>> TB of DB size. In this DB we have a table PRODUCT_INFO with around  1
>>> million rows and table size of 1 GB.
>>> We are looking for a implementation where we want to pull the data in
>>> real time for every 5 seconds from the DB ( Table mentioned above) and send
>>> it to IOT topic whenever an event occurs for a product. ( event is any new
>>> product information or change in the existing
>>> product information.).
>>>
>>>
> It's unclear whether you want to do table scans or if you're just looking
> for changes to the database. If you're looking just for changes, consider
> implementing something using logical replication. We have a logical
> replication system set up to stream changes from the database into an
> elastic search cluster, and it works great.
>
> Mark
>


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
Thanks for your reply Rob. Reading the below documentation link says the
EVENT trigger is only supported for DDL commands. Is it not correct ?.

*1) https://www.postgresql.org/docs/9.6/event-trigger-definition.html
<https://www.postgresql.org/docs/9.6/event-trigger-definition.html> *
(An event trigger fires whenever the event with which it is associated
occurs in the database in which it is defined. Currently, the only
supported events are ddl_command_start, ddl_command_end, table_rewrite and
sql_drop. Support for additional events may be added in future releases.).
2) Doesnt the trigger slow down inserts/update we are doing to the table ?.
Does it slow down if we are reading the data using the API when we have a
trigger in place ?.

Ron- Its a tiny subset of 1 GB Data for every 5 seconds but not on the
entire data.


Thanks !!.

On Wed, Jan 9, 2019 at 11:10 AM Rob Sargent  wrote:

>
>
> On Jan 9, 2019, at 10:02 AM, github kran  wrote:
>
>
>> Hi Postgres Team,
>>
>> I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
>> of DB size. In this DB we have a table PRODUCT_INFO with around  1 million
>> rows and table size of 1 GB.
>> We are looking for a implementation where we want to pull the data in
>> real time for every 5 seconds from the DB ( Table mentioned above) and send
>> it to IOT topic whenever an event occurs for a product. ( event is any new
>> product information or change in the existing
>> product information.).
>>
>> This table has few DML operations in real time either INSERT or UPDATE
>> based on the productId. ( Update whenever there is a change in the product
>> information and INSERT when a record doesnt exists for that product).
>>
>> We have REST API's built in the backend pulling data from this backend
>> RDS Aurora POSTGRES DB and used by clients.
>>
>> *UseCase*
>> We dont want clients to pull the data for every 5 seconds from DB but
>> rather provide a service which can fetch the data from DB in real time and
>> push the data to IOT topic by pulling data for every 5 seconds from DB.
>>
>> *Questions*
>> 1) How can I get information by pulling from the DB every 5 seconds
>> without impacting the performance of the DB.
>> 2) What are the options I have pulling the data from this table every 5
>> seconds. Does POSTGRES has any other options apart from TRIGGER ?.
>>
>>
>> Any ideas would be helpful.
>>
>> Thanks !!
>> GithubKran
>>
>
> There is DML event trapping.  You don’t poll every 5seconds you react
> immediately to each event (with trigger or event).  From the trigger
> perspective you probably have everything you need to update IOT with
> addition searching.
>


Re: Pulling data from Postgres DB table for every 5 seconds.

2019-01-09 Thread github kran
>
>
> Hi Postgres Team,
>
> I have an application using RDS Aurora Postgresql 9.6 version having 4 TB
> of DB size. In this DB we have a table PRODUCT_INFO with around  1 million
> rows and table size of 1 GB.
> We are looking for a implementation where we want to pull the data in real
> time for every 5 seconds from the DB ( Table mentioned above) and send it
> to IOT topic whenever an event occurs for a product. ( event is any new
> product information or change in the existing
> product information.).
>
> This table has few DML operations in real time either INSERT or UPDATE
> based on the productId. ( Update whenever there is a change in the product
> information and INSERT when a record doesnt exists for that product).
>
> We have REST API's built in the backend pulling data from this backend RDS
> Aurora POSTGRES DB and used by clients.
>
> *UseCase*
> We dont want clients to pull the data for every 5 seconds from DB but
> rather provide a service which can fetch the data from DB in real time and
> push the data to IOT topic by pulling data for every 5 seconds from DB.
>
> *Questions*
> 1) How can I get information by pulling from the DB every 5 seconds
> without impacting the performance of the DB.
> 2) What are the options I have pulling the data from this table every 5
> seconds. Does POSTGRES has any other options apart from TRIGGER ?.
>
>
> Any ideas would be helpful.
>
> Thanks !!
> GithubKran
>