Re: Updating large tables without dead tuples

2018-02-23 Thread Stephen Frost
Greetings,

* l...@laurent-hasson.com (l...@laurent-hasson.com) wrote:
> This was done during a maintenance window, and that table is read-only except 
> when we ETL data to it on a weekly basis, and so I was just wondering why I 
> should pay the "bloat" penalty for this type of transaction. Is there a trick 
> that could be use here?

Yes, create a new table and INSERT the data into that table, then swap
the new table into place as the old table.  Another option, if you don't
mind the exclusive lock taken on the table, is to dump the data to
another table, then TRUNCATE the current one and then INSERT into it.

There's other options too, involving triggers and such to allow updates
and other changes to be captured during this process, avoiding the need
to lock the table, but that gets a bit complicated.

> More generally, I suspect that the MVCC architecture is so deep that 
> something like LOCK TABLE, which would guarantee that there won't be 
> contentions, couldn't be used as a heuristic to not create dead tuples? That 
> would make quite a performance improvement for this type of work though.

I'm afraid it wouldn't be quite that simple, particularly you have to
think about what happens when you issue a rollback...

Thanks!

Stephen


signature.asc
Description: PGP signature


Updating large tables without dead tuples

2018-02-23 Thread l...@laurent-hasson.com
Hello

I work with a large and wide table (about 300 million rows, about 50 columns), 
and from time to time, we get business requirements to make some modifications. 
But sometimes, it's just some plain mistake. This has happened to us a few 
weeks ago where someone made a mistake and we had to update a single column of 
a large and wide table. Literally, the source data screwed up a zip code and we 
had to patch on our end.

Anyways... Query ran was:
update T set source_id = substr(sourceId, 2, 10);
Took about 10h and created 100's of millions of dead tuples, causing another 
couple of hours of vacuum.

This was done during a maintenance window, and that table is read-only except 
when we ETL data to it on a weekly basis, and so I was just wondering why I 
should pay the "bloat" penalty for this type of transaction. Is there a trick 
that could be use here?

More generally, I suspect that the MVCC architecture is so deep that something 
like LOCK TABLE, which would guarantee that there won't be contentions, 
couldn't be used as a heuristic to not create dead tuples? That would make 
quite a performance improvement for this type of work though.


Thank you,
Laurent.


Re: Performance

2018-02-23 Thread Andreas Kretschmer



Am 23.02.2018 um 20:29 schrieb Daulat Ram:

  We have the following requirements in single query or any proper solution. 
Please help on this.
How many sessions are currently opened.

ask pg_stat_activity, via select * from pg_stat_activity



  -and if opened then how many queries have executed on that session.


Whot? There isn't a counter for that, AFAIK.



  -and also we have to trace how much time each query is taking.


You can use auto_explain for that


  -and also we have to find the cost of each query.


the same, auto_explain

please keep in mind: costs are imaginary.



Regards, Andreas

--
2ndQuadrant - The PostgreSQL Support Company.
www.2ndQuadrant.com




Please help

2018-02-23 Thread Daulat Ram

Hello team,

I need help how  & what we can monitor the Postgres database via Nagios.

I came to know about the check_postgres.pl script but we are using free ware 
option of postgres. If its Ok with freeware then please let me know the steps 
how I can implement in our environment.

Regards,
Daulat



Performance

2018-02-23 Thread Daulat Ram
 Hello experts,
 
 We have the following requirements in single query or any proper solution. 
Please help on this.
How many sessions are currently opened.
 -and if opened then how many queries have executed on that session.
 -and also we have to trace how much time each query is taking.
 -and also we have to find the cost of each query.
 
 Regards,
 Daulat
 



Re: effective_io_concurrency on EBS/gp2

2018-02-23 Thread Vitaliy Garnashevich
I noticed that the recent round of tests being discussed never 
mentioned the file system used.  Was it XFS?  Does changing the 
agcount change the behaviour?


It was ext4.

Regards,
Vitaliy




Re: need advice to tune postgresql

2018-02-23 Thread Laurenz Albe
Darius Pėža wrote:
> I have issue that update queries is slow, I need some advice how improve 
> speed. I don't have much control to change queries. But I can change 
> postresql server configuration
> 
> query example:
> 
> UPDATE "project_work" SET "left" = ("project_work"."left" + 2) WHERE 
> ("project_work"."left" >= 8366)
> 
> sometimes updated lines count is up to 10k
> 
> postgresql version 9.3
> 
> postgresl.conf
> max_connections = 100 
> shared_buffers = 6GB  # min 128kB
> work_mem = 100MB  # min 64kB
> 
> all other values are default
> 
> server hardware
> Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
> 16GB RAM
> disk is HDD
> 
> about half of resource I can dedicate for postgresql server.

If the number of updated lines is that big, you should try to
get HOT updates as much as possible.

For that, make sure that there is *no* index on the column,
and that the fillfactor for the table is suitably low (perhaps 50).

During a HOT update, when the new row version fits into the same
page as the old one, the indexes don't have to be updated.
That will speed up the UPDATE considerably.

On the other hand, an UPDATE like yours would then always use a
sequential scan, but that may still be a net win.

Other than that, setting checkpoint_segments high enough that
you don't get too many checkpoints can help.

Of course, more RAM and fast storage are always good.

Yours,
Laurenz Albe
-- 
Cybertec | https://www.cybertec-postgresql.com



Re: effective_io_concurrency on EBS/gp2

2018-02-23 Thread Rick Otten
On Thu, Feb 8, 2018 at 11:40 AM, Vitaliy Garnashevich <
vgarnashev...@gmail.com> wrote:

> Anyway, there are still some strange things happening when
> effective_io_concurrency is non-zero.
>
> ...
>


> Vitaliy
>
>
I was researching whether I could optimize a concatenated lvm2 volume when
I have disks of different speeds (concatenated - not striped - and I think
I can if I concatenate them in the right order - still testing on that
front), when I came across this article from a few years ago:
http://www.techforce.com.br/content/lvm-raid-xfs-and-ext3-file-systems-tuning-small-files-massive-heavy-load-concurrent-parallel

In the article he talks about the performance of parallel io on different
file systems.

Since I am already running XFS that led me to this tunable:
http://xfs.org/docs/xfsdocs-xml-dev/XFS_Filesystem_Structure/tmp/en-US/html/Allocation_Groups.html

Which brought me back to this discussion about effective_io_concurrency
from a couple of weeks ago.  I noticed that the recent round of tests being
discussed never mentioned the file system used.  Was it XFS?  Does changing
the agcount change the behaviour?


Re: need advice to tune postgresql

2018-02-23 Thread MichaelDBA
What caught my eye is the update count can be up to 10K.  That means if 
autovacuum is not keeping up with this table, bloat may be increasing at 
a high pace leading to more page I/O which causes degraded performance.  
If the table has become bloated, you need to do a blocking VACUUM FULL 
on it or a non-blocking VACUUM using pg_repack.  Then tune autovacuum so 
that it can keep up with the updates to this table or add manual vacuum 
analyze on this table at certain times via a cron job. Manual vacuums 
(user-initiated) will not be bumped as with autovacuums that can be 
bumped due to user priority.


Regards,
Michael Vitale



Darius Pėža 
Friday, February 23, 2018 9:42 AM
 I have issue that update queries is slow, I need some advice how 
improve speed. I don't have much control to change queries. But I can 
change postresql server configuration


query example:

UPDATE "project_work" SET "left" = ("project_work"."left" + 2) WHERE 
("project_work"."left" >= 8366)


sometimes updated lines count is up to 10k

postgresql version 9.3

postgresl.conf
max_connections = 100
shared_buffers = 6GB# min 128kB
work_mem = 100MB# min 64kB

all other values are default

server hardware
Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
16GB RAM
disk is HDD

about half of resource I can dedicate for postgresql server.





need advice to tune postgresql

2018-02-23 Thread Darius Pėža
 I have issue that update queries is slow, I need some advice how improve
speed. I don't have much control to change queries. But I can change
postresql server configuration

query example:

UPDATE "project_work" SET "left" = ("project_work"."left" + 2) WHERE
("project_work"."left" >= 8366)

sometimes updated lines count is up to 10k

postgresql version 9.3

postgresl.conf
max_connections = 100
shared_buffers = 6GB # min 128kB
work_mem = 100MB # min 64kB

all other values are default

server hardware
Intel(R) Xeon(R) CPU E5-2637 v4 @ 3.50GHz
16GB RAM
disk is HDD

about half of resource I can dedicate for postgresql server.