from:"Tomas Vondra"

Re: [GENERAL] Is it OK to create a directory in PGDATA dir

2017-10-19 Thread Tomas Vondra

Hi,

On 10/19/2017 09:58 PM, rakeshkumar464 wrote:
> In the container world, sometime the only persistent storage path
> (that is, storage outside container world) is PGDATA.>

I don't want to be the "You're doing it wrong!" guy, but you're doing it
wrong. If a container only gives you a single persistent storage path,
then don't use it as PGDATA directly, but create PGDATA as one of the
directories in that path. And create a directory for other stuff next to
it, not in the data directory.

>
> Is it fine to create a subdir inside PGDATA and store our stuff
> there, or will PG freak out seeing a foreign object.
> 

PostgreSQL certainly does not check if there are unknown directories in
the data directory, and it will not crash and burn. But it causes all
sorts of problems, and it increases the probability of human error.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

2017-10-10 Thread Tomas Vondra



On 10/11/2017 02:26 AM, pinker wrote:
> Tomas Vondra-4 wrote
>> I'm probably a bit dumb (after all, it's 1AM over here), but can you
>> explain the CPU chart? I'd understand percentages (say, 75% CPU used)
>> but what do the seconds / fractions mean? E.g. when the system time
>> reaches 5 seconds, what does that mean?
> 
> hehe, no you've just spotted a mistake, it suppose to be 50 cores :)
> out of 80 in total
> 

Ah, so it should say '50 cores' instead of '5s'? Well, that's busy
system I guess.

> 
> Tomas Vondra-4 wrote
>> Have you tried profiling using perf? That usually identifies hot spots
>> pretty quickly - either in PostgreSQL code or in the kernel.
> 
> I was always afraid because of overhead, but maybe it's time to start ...
> 

I don't follow. If you're not in trouble, a little bit of additional
overhead is not an issue (but you generally don't need profiling at that
moment). If you're already in trouble, then spending a bit of CPU time
on basic CPU profile is certainly worth it.

> 
> Tomas Vondra-4 wrote
>> What I meant is that if the system evicts this amount of buffers all the
>> time (i.e. there doesn't seem to be any sudden spike), then it's
>> unlikely to be the cause (or related to it).
> 
> I was actually been thinking about scenario where different sessions
> want to at one time read/write from or to many different relfilenodes,
> what could cause page swap between shared buffers and os cache?
Perhaps. If the sessions only do reads, that would not be visible in
buffer_backends I believe (not sure ATM, would have to check source).
But it'd be visible in buffers_alloc and certainly in blks_read.

> we see that context switches on cpu are increasing as well. kernel 
> documentation says that using page tables instead of Translation
> Lookaside Buffer (TLB) is very costly and on some blogs have seen
> recomendations that using huge pages (so more addresses can fit in
> TLB) will help here but postgresql, unlike oracle, cannot use it for
> anything else than page buffering (so 16gb) ... so process memory
> still needs to use 4k pages.
>

The context switches are likely due to large number of runnable
processes competing for the CPU.

Also, memory bandwidth is increasingly an issue on big boxes ...

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

2017-10-10 Thread Tomas Vondra



On 10/11/2017 12:28 AM, pinker wrote:
> Tomas Vondra-4 wrote
>> What is "CPU load"? Perhaps you mean "load average"?
> 
> Yes, I wasn't exact: I mean system cpu usage, it can be seen here - it's the
> graph from yesterday's failure (after 6p.m.):
> <http://www.postgresql-archive.org/file/t342733/cpu.png> 
> So as one can see connections spikes follow cpu spikes...
> 

I'm probably a bit dumb (after all, it's 1AM over here), but can you
explain the CPU chart? I'd understand percentages (say, 75% CPU used)
but what do the seconds / fractions mean? E.g. when the system time
reaches 5 seconds, what does that mean?

> 
> Tomas Vondra-4 wrote
>> Also, what are the basic system parameters (number of cores, RAM), it's
>> difficult to help without knowing that.
> 
> I have actually written everything in the first post:
> 80 CPU and 4 sockets
> over 500GB RAM
> 

Apologies, missed that bit.

> 
> Tomas Vondra-4 wrote
>> Well, 3M transactions over ~2h period is just ~450tps, so nothing
>> extreme. Not sure how large the transactions are, of course.
> 
> It's quite a lot going on. Most of them are complicated stored procedures.
> 

OK.

> 
> Tomas Vondra-4 wrote
>> Something gets executed on the database. We have no idea what it is, but
>> it should be in the system logs. And you should see the process in 'top'
>> with large amounts of virtual memory ...
> 
> Yes, it would be much easier if it would be just single query from the top,
> but the most cpu is eaten by the system itself and I'm not sure why. I
> suppose because of page tables size and anon pages is NUMA related.
> 

Have you tried profiling using perf? That usually identifies hot spots
pretty quickly - either in PostgreSQL code or in the kernel.

> 
> Tomas Vondra-4 wrote
>> Another possibility is a run-away query that consumes a lot of work_mem.
> 
> It was exactly my first guess. work_mem is set to ~ 350MB and I see a lot of
> stored procedures with unnecessary WITH clauses (i.e. materialization) and
> right after it IN query with results of that (hash).
> 

Depends on how much data is in the CTEs. We don't really allocate all of
work_mem at once, but bit by bit.

> 
> Tomas Vondra-4 wrote
>> Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
>> and then you can decide.
> 
> Thank you for the tip. I always do it but haven't here,  so the result is
> 0.992969610990056 - so increasing it is rather pointless.
> 

Yeah.

> 
> Tomas Vondra-4 wrote
>> You may also make the bgwriter more aggressive - that won't really
>> improve the hit ratio, it will only make enough room for the backends.
> 
> yes i probably will
> 

On the other hand, the numbers are rather low. I mean, the backends
seems to be evicting ~15k buffers over 5-minute period, which is pretty
much nothing (~400kB/s). I wouldn't bother by tuning this.

> 
> Tomas Vondra-4 wrote
>> But I don't quite see how this could cause the severe problems you have,
>> as I assume this is kinda regular behavior on that system. Hard to say
>> without more data.
> 
> I can provide you with any data you need :)
> 

What I meant is that if the system evicts this amount of buffers all the
time (i.e. there doesn't seem to be any sudden spike), then it's
unlikely to be the cause (or related to it).

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] core system is getting unresponsive because over 300 cpu load

2017-10-10 Thread Tomas Vondra

On 10/10/2017 10:40 PM, pinker wrote:
> Hi to all!
> 
> We've got problem with a very serious repetitive incident on our core
> system. Namely, cpu load spikes to 300-400 and the whole db becomes

What is "CPU load"? Perhaps you mean "load average"?

Also, what are the basic system parameters (number of cores, RAM), it's
difficult to help without knowing that.

> unresponsive. From db point of view nothing special is happening, memory
> looks fine, disks io's are ok and the only problem is huge cpu load. Kernel
> parameters that are increasing with load are always the same:
> 
> * page tables size
> * Committed_AS
> * Active anon
> 
> <http://www.postgresql-archive.org/file/t342733/pagetables.png> 
> 
> and the total number of connections are increasing very fast (but I suppose
> it's the symptom not the root cause of cpu load) and exceed max_connections
> (1000).
> 

I doubt you have 1000 cores in your system, so 1000 connections active
at the same time is guaranteed to cause issues. What we see quite often
is a minor hiccup (occasional slow query) snowballing into much more
serious trouble exactly because of this.

Queries get slower for some reason, application starts opening more
connections (through a built-in connection pool) to run more queries,
that further increases pressure, slows the queries even more, ...

As Scott suggested, you should consider using a connection pool.

> System:
> * CentOS Linux release 7.2.1511 (Core) 
> * Linux 3.10.0-327.36.3.el7.x86_64 #1 SMP Mon Oct 24 16:09:20 UTC 2016
> x86_64 x86_64 x86_64 GNU/Linux
> * postgresql95-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-contrib-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-docs-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-libs-9.5.5-1PGDG.rhel7.x86_64
> * postgresql95-server-9.5.5-1PGDG.rhel7.x86_64
> 
> * 4 sockets/80 cores
> * vm.dirty_background_bytes = 0
> * vm.dirty_background_ratio = 2
> * vm.dirty_bytes = 0
> * vm.dirty_expire_centisecs = 3000
> * vm.dirty_ratio = 20
> * vm.dirty_writeback_centisecs = 500
> 
> after the first incident we have changed:
> * increased shared_buffers to 16GB (completely on huge pages. previously
> 2GB)
> * adjusted vm.nr_hugepages to 8000 (we've got 2mb pages)
> * changed vm.overcommit_memory = 2 and vm.overcommit_ratio = 99
> * disabled transparent huge pages (they were set before unfortunately to
> 'always')
> 
> 
> It's a highly transactional db. Today I've run:
> select now(), txid_current();
> and the results:
> 3 339 351 transactions between 2017-10-10 14:42 and 2017-10-10 16:24
> 

Well, 3M transactions over ~2h period is just ~450tps, so nothing
extreme. Not sure how large the transactions are, of course.

> ... snip ...
> 
> So, from the kernel stats we know that the failure happens when db is trying
> to alocate some huge amount of pages (page tables size, anons, commited_as).
> But what is triggering this situation? 
>

Something gets executed on the database. We have no idea what it is, but
it should be in the system logs. And you should see the process in 'top'
with large amounts of virtual memory ...

> I suppose it could be lazy autovacuum (just standard settings). So
> autovacuum had to read whole 369gb yesterday to clean xids. today did the
> same on some other tables.

Possible, but it shouldn't allocate more than maintenance_work_mem. So I
don't why it would allocate so much virtual memory.

Another possibility is a run-away query that consumes a lot of work_mem.

> Another idea is too small shared buffers setting. 

... snip ...

> bgwriter stats:
> <http://www.postgresql-archive.org/file/t342733/bgwriter.png> 
> 

Yes, this suggests you probably have shared_buffers set too low, but
it's impossible to say if increasing the size will help - perhaps your
active set (part of DB you regularly access) is way too big.

Measure cache hit ratio (see pg_stat_database.blks_hit and blks_read),
and then you can decide.

You may also make the bgwriter more aggressive - that won't really
improve the hit ratio, it will only make enough room for the backends.

But I don't quite see how this could cause the severe problems you have,
as I assume this is kinda regular behavior on that system. Hard to say
without more data.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Function to return per-column counts?

2017-09-28 Thread Tomas Vondra



On 09/28/2017 04:34 PM, Seamus Abshere wrote:
> hey,
> 
> Does anybody have a function lying around (preferably pl/pgsql) that
> takes a table name and returns coverage counts?
> 

What is "coverage count"?

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Speed of conversion from int to bigint

2017-09-27 Thread Tomas Vondra



On 09/27/2017 10:08 AM, Jonathan Moules wrote:
> Hi,
> (Postgres 9.5 and 9.6)
> We have a table of about 650million rows. It's a partitioned table, with
> two "child" tables. We want to change its primary key type from int to
> bigint while retaining the current values.
> 
> We're using this:
> 
> ALTER TABLE dta.my_table ALTER column table_id TYPE bigint;
> 
> But it's taking a very long time, and locking the database. We're going
> to need to do this in production as well, so a long-term table-lock
> isn't workable.

It's taking very long time, because it does a full-table rewrite while
holding AccessExclusiveLock on it. Which is the strongest lock mode.

> Is there anything we can do to speed things up? How long is this likely
> to take?
> 

What you can do, is roughly this:

---
ALTER TABLE my_table ADD COLUMN new_table_id TYPE bigint;

-- do this in batches, so that a single transaction does not update
-- all the rows
UPDATE my_table SET new_table_id = table_id;

-- build unique index on the new column
CREATE INDEX CONCURRENTLY my_table_2_pkey ON my_table (new_table_id);

BEGIN;
-- switch the PK constraint
ALTER TABLE my_table DROP CONSTRAINT my_table_pkey;
ALTER TABLE my_table ADD CONSTRAINT my_table_pkey PRIMARY KEY
(new_table_id) USING my_table_2_pkey;

-- drop, rename the columns
ALTER TABLE my_table DROP COLUMN table_id;
ALTER TABLE my_table ALTER COLUMN new_table_id RENAME TO table_id;
COMMIT;

You may need to tweak this to handle the inheritance tree.

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] shared_buffers smaller than max_wal_size

2017-09-25 Thread Tomas Vondra



On 09/25/2017 11:10 AM, Vladimir Mihailenco wrote:
> Thanks again - for some reason I thought that each page should be
> fsynced separately...
> 
> I am running ZFS and going to try following config on 32gb server:
> 
> shared_buffers = 512mb (previously was 6gb)
> max_wal_size = 8gb
> zfs_arc_max = 24gb
> 
> i.e. run with minimal shared buffers and do all the caching in ZFS. As I
> understand it now such config can provide better results since data will
> be cached once in ZFS.
> 

Maybe, or maybe not. It really depends on the nature of your workload.
The thing is, if the shared buffers are too small, dirty pages may be
evicted repeatedly, i.e.

1) postgres needs to modify a page, reads it from ARC to shared buffers
and modifies it

2) there's not enough free space in shared buffers, so the page gets
evicted from shared buffers to ARC (and eventually to disk)

3) postgres needs the page again, and reads it from ARC (or disk) back
to shared buffers

4) not enough free space in shared buffers - page gets written out

5) rinse and repeat

I don't recall off-hand how exactly is ARC managed, but I suppose
there's some sort of expiration period, so a single page may end up
being written to disk repeatedly.

With sufficiently large shared buffers the page would be repeatedly
modified in shared buffers, and written our just once by the checkpoint
process.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] shared_buffers smaller than max_wal_size

2017-09-24 Thread Tomas Vondra

On 09/24/2017 11:03 AM, Vladimir Mihailenco wrote:
> Thanks for your response. Ss I understand it now the difference is
> that checkpoints are synchronous but dirty pages eviction from shared
> buffers are asynchronous, correct? How then Postgres ensures that OS
> writes data to the disk so WAL can be deleted?
> 

The last step in a checkpoin is fsync() on the files. Without that, the
checkpoint is considered incomplete and the database won't rely on it.


regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] shared_buffers smaller than max_wal_size

2017-09-23 Thread Tomas Vondra

Hi,

On 09/23/2017 08:18 AM, Vladimir Mihailenco wrote:
> Hi,
> 
> I wonder what is the point of setting max WAL size bigger than shared
> buffers, e.g.
> 
> shared_buffers = 512mb
> max_wal_size = 2gb
> 
> As I understand a checkpoint happens after 2gb of data were modified
> (writter to WAL), but shared buffers can contain at most 512mb of dirty
> pages to be flushed to the disk. Is it still a win or I am missing
> something?

Those are mostly unrelated things.

max_wal_size determines how often you'll do checkpoints. So with a lot
of writes you probably need high max_wal_size, otherwise you'll do
checkpoints very often. Choose reasonable checkpoint_timeout and set
max_wal_size based on that.

Shared buffers are mostly about caching data accessed by queries. If you
can squeeze the frequently accessed data into shared buffers (high cache
hit ratio), great.

Moreover, there's very little relation between max_wal_size and
shared_buffers, for a number of reasons:

1) You can modify the same 8kB page repeatedly - it will still be just
8kB of dirty data in shared buffers, but each update will generate a
little bit of WAL data. In an extreme case a single 8kB page might be
responsible for most of the 2GB of WAL data.

2) When changing the data page, we only really write the minimum amount
of data describing the change into WAL. So it's not 1:1.

3) When a page is evicted from shared buffers, we don't fsync it to disk
immeditely. We write it out to page cache, and leave the eviction to the
OS (with some exceptions), so it's asynchronous. WAL writes are
asynchronous.

4) Shared buffers are not just about dirty data, it's also about caching
reads. No one knows what is the read:write ratio, what part of the
database will receive writes, etc.

So there's nothing inherently wrong with (shared_buffers > max_wal_size)
or (shared_buffers > max_wal_size), it depends on your workload.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] random row from a subset

2017-09-20 Thread Tomas Vondra



On 09/20/2017 01:28 PM, bluefrog wrote:
> 
> thanks, interestingly your method works in both Oracle and PostgreSQL,
> albeit with a different random function call.
> It does not work in SQL Anywhere though.
> 

You will have to ask SQL Anywhere people, I guess.

cheers

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] random row from a subset

2017-09-20 Thread Tomas Vondra



On 09/20/2017 02:33 AM, Peter Koukoulis wrote:
> 
> I'm attempting to get a random, based on a range that spans 1 to the
> maximum number of rows that for a subset.
> I run the query in Oracle sucessfully and get a different number each
> time and only a single number, which is what I am expecting,
> 
> but when I run the same query, albeit the random function is different,
> I either observe no result, a single row or two rows, 
> for example:
> 
> ft_node=# select c_id
> from    (
>          select c_id, row_number() over (order by c_d_id) as rn
>               ,  count(*) over() max_rn
>          from customer where c_d_id=5
>         ) t
> where rn = (select floor(random()*(max_rn))+1);

The problem here is that random() is volatile, so it's executed for each
row. So essentially if the subselect has 100 rows, you'll get 100 random
values. So you're "rolling the dice" for every row independently.
Sometimes one row matches, sometime none, sometime more than one.

You need to do either this:

with rand as (select random() as r)
select c_id
from(
 select c_id, row_number() over (order by c_d_id) as rn
  ,  count(*) over() max_rn
 from customer where c_d_id=5
) t
where rn = (select floor(r*(max_rn))+1 from rand);

or define an immutable wrapper for random():

CREATE FUNCTION random_stable() RETURNS DOUBLE PRECISION
AS 'SELECT random()'
LANGUAGE SQL
IMMUTABLE;

and use that instead.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] looking for a globally unique row ID

2017-09-16 Thread Tomas Vondra

On 09/16/2017 11:57 AM, Rafal Pietrak wrote:
> 
> 
>   I'M NOT POSTGRES HACKER. THIS IS JUST NA INTUITION.
> 
> 
> W dniu 15.09.2017 o 21:30, Christopher Browne pisze:
>> On 15 September 2017 at 14:45, Adam Brusselback
>> <adambrusselb...@gmail.com> wrote:
> 
> [---]
>>
>> b) Referencing which index entries can be dropped (e.g. - vacuumed out)
>> is a fair bit more complex because the index entries depend on multiple
>> tables.  This adds, um, concurrency complications, if data is being deleted
>> from multiple tables concurrently.  (Over-simplifying question:  "When
>> a table that participates in the sharing is vacuumed, does the shared
>> index get vacuumed?  What if two such tables are vacuumed concurrently?")
> 
> This is probably postgresql-hackers knowledge, but I'd appreciate if
> you elaborate: why "concurrent vacuum" of two table with common index
> is such a problem?
> 

Because it means vacuums on partitions will clash on the shared (global)
index, unlike today with only local indexes. Vacuum happens in about
three basic phases:

1) Remove dead tuples from tables, reduce them to item pointers and
stash the row IDs in a list.

2) Scan all indexes on the table, remove items with IDs on the list.

3) Scan the table again, finally removing the item pointers.

Currently this happens on partition level, so the individual vacuums
don't interfere by modifying the same index. With global indexes (at
least when implemented as a single object) that would no longer be the
case, and the consequences are a bit terrifying ...

> 1. why cannot it be "serialized on demand" in such case/exception
> (e.i the case of tables being bond by a common index)? In other
> words, can the initial concurrency be turned into serialized
> commands?
> 

It can. But serialization is pretty much the arch enemy of scalability.
Amdahl's law and all that ...

What's particularly bad on the vacuum serialization is that it does not
serialize client requests but maintenance tasks. That's quite dangerous
IMNSHO, as it may easily result in bloat explosion.

> 2. why common index cannot be implemented as "split files" - locking 
> with their respective tables the usual way? The problem of
> concurrent locking would vanish at the expense of performance hit ...
> but if someone desperately needs such "global index", I'd bet he/she
> will live with performance hit. I would.
> 

It can be implemented by split files. But that once again introduces
massive locking (imagine hundreds or thousands of partitions), and thus
overhead.

It's a bit foolish to say you would live with the performance hit when
you don't know how serious would it be. It may be fine but it may also
be much worse than you expected.

I ran into a number of people who used global indexes on other DBs, and
their position is mostly "I'll never use that, because it pretty much
eliminates all benefits of partitioning." So we need to thread pretty
carefully here, not to run into the same limitations.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-28 Thread Tomas Vondra


On 04/28/2017 01:34 AM, Andres Freund wrote:

On 2017-04-28 01:29:14 +0200, Tomas Vondra wrote:

I can confirm this observation. I bought the Intel 750 NVMe SSD last year,
the device has 1GB DDR3 cache on it (power-loss protected), can do ~1GB/s of
sustained O_DIRECT sequential writes. But when running pgbench, I can't push
more than ~300MB/s of WAL to it, no matter what I do because of
WALWriteLock.


Hm, interesting.  Even if you up wal_buffers to 128MB, use
synchronous_commit = off, and play with wal_writer_delay/flush_after?



I think I've tried things like that, but let me do some proper testing. 
I'll report the numbers in a few days.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Tomas Vondra


On 04/27/2017 07:35 PM, Andres Freund wrote:

On 2017-04-27 10:29:48 -0700, Joshua D. Drake wrote:

On 04/27/2017 09:34 AM, Andres Freund wrote:

On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:

On 04/27/2017 08:59 AM, Andres Freund wrote:




I would agree it isn't yet a widespread issue.


I'm not yet sure about that actually. I suspect a large
percentage of people with such workloads aren't lingering lots on
the lists.


That would probably be true. I was thinking of it more as the
"most new users are in the cloud" and the "cloud" is going to be
rare that a cloud user is going to be able to hit that level of
writes. (at least not without spending LOTS of money)


You can get pretty decent NVMe SSD drives on serveral cloud
providers these days, without immediately bancrupting you. Sure, it's
instance storage, but with a decent replication and archival setup,
that's not necessarily an issue.

It's not that hard to get to the point where postgres can't keep up
with storage, at least for some workloads.



I can confirm this observation. I bought the Intel 750 NVMe SSD last 
year, the device has 1GB DDR3 cache on it (power-loss protected), can do 
~1GB/s of sustained O_DIRECT sequential writes. But when running 
pgbench, I can't push more than ~300MB/s of WAL to it, no matter what I 
do because of WALWriteLock.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] [PERFORM] Questionaire: Common WAL write rates on busy servers.

2017-04-27 Thread Tomas Vondra


On 04/27/2017 06:34 PM, Andres Freund wrote:

On 2017-04-27 09:31:34 -0700, Joshua D. Drake wrote:

On 04/27/2017 08:59 AM, Andres Freund wrote:



Ok, based on the, few, answers I've got so far, my experience is
indeed skewed. A number of the PG users I interacted with over
the last couple years had WAL write ranges somewhere in the range
of 500MB/s to 2.2GB/s (max I'veseen). At that point WAL insertion
became a major bottleneck, even if storage was more than fast
enough to keep up. To address these we'd need some changes, but
the feedback so far suggest that it's not yet a widespread
issue...


I would agree it isn't yet a widespread issue.


I'm not yet sure about that actually. I suspect a large percentage
of people with such workloads aren't lingering lots on the lists.



To a certain extent, this is a self-fulfilling prophecy. If you know 
you'll have such a busy system, you probably do some research and 
testing first, before choosing the database. If we don't perform well 
enough, you pick something else. Which removes the data point.


Obviously, there are systems that start small and get busier and busier 
over time. And those are the ones we see.


cheers

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Unexpected WAL-archive restore behaviour

2017-02-18 Thread Tomas Vondra


On 02/18/2017 10:01 PM, Nikolay Petrov wrote:

Hot standby server was started with new recovery.conf, reached
consistent recovery state and resumed to restore WAL segments from
archive. When couple of next wal segments became unavailable slave
server just "skipped" them, instead to start streaming WAL from
primary:


What do you mean by "became unavailable"? The restore_command may be 
called for segments that do not exist - that's expected, and I suspect 
the "gzip: stdin: unexpected end of file" error messages are caused by that.


Then, on the next try, that WAL segment probably existed, so it got 
restored. It's difficult to say what actually happened, because you've 
only posted doctored log - for example nowhere in the log is a message 
that the standby started streaming from the primary, i.e. something like 
this:


LOG:  started streaming WAL from primary at 18/DE00 on timeline 1

But maybe not - it's possible this happening because the archive and 
restore commands are not atomic. E.g. let's say the restore_command gets 
executed while the WAL segment is still being archived. In that case 
it'll find an incomplete gzip archive, will try to decompress it, and 
will fail with exactly the same ('unexpected end of file') error.


So I think you should read the docs about archive/restore command 
requirements [1] and maybe also the wiki page about setting warm standby 
[2]:


[1] https://www.postgresql.org/docs/9.6/static/continuous-archiving.html

[2] https://wiki.postgresql.org/wiki/Warm_Standby

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] using hstore to store documents

2017-01-26 Thread Tomas Vondra


On 01/26/2017 09:08 AM, Rita wrote:

Is it possible to use hstore to store xml document?

I would like to have

"status" => "..."::xml



Keys/values in hstore are simply text strings, so you can store anything 
you want formatted as a string.


Not sure if that's what you're looking for though, you'll probably need 
to explain what you're trying to achieve to get better answers.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How does Postgres estimate the memory needed for sorting/aggregating

2017-01-25 Thread Tomas Vondra


On 01/25/2017 10:47 PM, John R Pierce wrote:

On 1/25/2017 12:59 PM, Thomas Kellerer wrote:

So here is my question: how does Postgres estimate/know the memory
needed for the aggregation? Or does it dynamically resize the memory
if the initial assumption was wrong?


my understanding is it fits as much as it can into a work_mem sized
allocation, and if thats not enough uses temporary files and multiple
passes.



That only works for hash joins, not for hash aggregates. Hash aggregate 
is about the only operation in PostgreSQL that can cause OOM because of 
under-estimation.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] How does Postgres estimate the memory needed for sorting/aggregating

2017-01-25 Thread Tomas Vondra


On 01/25/2017 09:59 PM, Thomas Kellerer wrote:

There was a question on dba.stackexchange recently:

   http://dba.stackexchange.com/a/162117/1822

That question (and the answer) deals with performance difference of a
query caused by the _declared_ length of a VARCHAR column in SQL Server
(everything else being equal - especially the actual data length)

For the curios: it does make a (big) difference in performance if you
declare varchar(100) or varchar(2000) in SQL Server - something that
really surprised me.

The difference in performance in SQL Servers seems to be caused by SQL
Server's optimizer that uses the _declared_ length of a column to
estimate the memory needed for the aggregation (or sorting).

Now, we all know that there is no performance difference whatsoever for
varchar columns regardless of the declared length.

In one of the comments, to that answer the question was asked how
Postgres knows how much memory it needs to allocate to do the aggregation.

I guess this is based on the column statistics stored in pg_stats, but I
am not sure:



It is based on the average length of values in that column, yes.

We estimate the number of distinct groups produced by the aggregation, 
and multiply it by average length of the key(s). The declared maximum 
length of a column does not matter.


So if the grouping is expected to produce 1000 groups, and each key 
column is 100B on average, 100kB should be enough - but only for the 
keys. The estimate also has to include the aggregate states, which is a 
different thing.


>

So here is my question: how does Postgres estimate/know the memory
needed for the aggregation? Or does it dynamically resize the memory if
the initial assumption was wrong?



I'm not sure what you mean by 'dynamically resize'. The above decision 
is pretty much how planner decides whether to use hash aggregate or 
group aggregate. If we estimate that the hash aggregate will fit into 
work_mem, the planner will consider both possibilities. If the estimate 
says hash aggregate would not fit into work_mem, we'll only consider 
group aggregate, because that can work with very little memory.


At execution time we'll only use as much memory as actually needed. The 
trouble is that if we under-estimated the amount of memory, there's no 
way back.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-18 Thread Tomas Vondra


On 01/19/2017 06:21 AM, Merlin Moncure wrote:
>
...


yes.  Either way, I would like to very much understand how server is
preferring 3m cost plan to 2.6k cost plan or is otherwise unable to
access the cheap plan.



Perhaps it doesn't even see the 2.6k cost - it may not generate the plan 
for some reason, or it arrives to different estimates.


The OP was repeatedly asked for details of the configuration, I've even 
sent a query for doing that a few days back. It's difficult to help when 
such basic info is not provided, though.


It'd also be interesting to see the statistics for the tables involved, 
so something like


   SELECT * FROM pg_stats WHERE tablename IN (... tables ...)

and

   SELECT relname, relpages, reltuples, relallvisible
 FROM pg_class WHERE relame IN (... tables ...)

might shed some light on what the planner assumes about the data.

Of course, there are other things we might need to know. For example if 
there are foreign keys between the tables, 9.6 is using that for the 
estimates (unlike 9.5). And so on.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-17 Thread Tomas Vondra


Hi,

after looking at the explain plans again, I very much doubt those come 
from the same query. The reason is the 9.5 plan contains this part:


->  HashAggregate  (cost=67.54..68.12 rows=192 width=4)
  Group Key: d.vip_patient_id
  ->  Nested Loop  (cost=0.17..67.44 rows=192 width=4)
->  Index Scan using unq_user_name on tblcnaccounts ...
  Index Cond: ((user_name)::text = 'dd'::text)
->  Index Only Scan using idx_tblcndoctorpatientmap ...
  Index Cond: (master_user_id = a.master_user_id)

while the 9.6 plan does not include anything like that, i.e. there's 
only a single aggregate at the top level, without any group keys. Also, 
the SQL query you've provided does not include any GROUP BY clause, so I 
claim that those plans are from two different queries.


regards
Tomas

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Queries are taking way longer in 9.6 than 9.5

2017-01-17 Thread Tomas Vondra


On 01/17/2017 08:31 PM, Ravi Tammineni wrote:

Hi,

We have recently upgraded to 9.6 and few queries are performing very
poorly. Query execution time has increased more 4 or 5 times in 9.6.
Qeruy execution plan is also completely changed. I am not sure whether
its because of Parallel feature or any bug in 9.6. There are few similar
kind of queries and all of them are taking more time in 9.6.

Following query is taking 70ms in 9.5 and the same query is taking 2
minutes in 9.6. Execution plan is totally different in 9.6 and seems
like there is a major flaw while generating the execution plan. Instead
of filtering the low cardinality, its filtering from biggest table.
Somehow nested loop joins are screwed up.

I really appreciate your help.

explain analyze
SELECT count(*)
...


You say "explain analyze" but have only provided EXPLAIN output. We need 
to see the EXPLAIN ANALYZE to see how accurate the estimates are.


Others already recommended doing ANALYZE to collect statistics on the 
upgraded database - if Heroku doesn't do that automatically, I'd 
recommend reporting that as a bug to them.


If that does not resolve the issue, run this on both machines and send 
the results to the mailing list:


select name, setting from pg_settings where source != 'default';

That'll tell us how was the configuration modified.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] Re: Are new connection/security features in order, given connection pooling?

2017-01-13 Thread Tomas Vondra

On 01/12/2017 03:12 AM, Karl Czajkowski wrote:

I can relate to the original plea from my own exploration of this
topic.

Before I get into that, I will mention as an aside that to date we
have found RLS to be really slow for web client authorization, given
that we have to use session parameters to store web client context
and there doesn't seem to be a way to get the query planner to treat
these settings as constants nor to lift static parts of RLS policy
out of per-row loops. It's so slow that I wonder whether future
releases will improve it before we are forced to move authorization
back into the client-side application logic, making our generated SQL
much more complex but better tuned to the query planner's
capabilities.

I'm no expert in RLS, but I know others have been making the planner 
smarter when it comes to RLS. It would be helpful to compile this into a 
bunch of examples with queries/explain plans, and report it to 
pgsql-performance. It'd provide practical feedback from real-world RLS 
use, and maybe there already is a reasonable way to improve the plans.

I assume you use current_config() function to access the parameters? I 
see that while it's marked as stable, it's not marked as leak-proof, 
which might be why the planner can't apply some of the optimizations 
when used in a RLS policy.

>

As background, our web applications essentially compile new SQL
queries on the fly and send them through to PostgreSQL. There is a
lot of metaprogramming involved in generating SQL statements to
embody the work we want done on behalf of our clients, and this
doesn't fit well into traditional prepared statements etc.

Sure, a lot of systems generate queries on the fly. Also, if the main 
problem is poor plan choice due to RLS, I'm not sure how prepared 
statements could help with that.

>

The web service needs to assert the client context and the
client-level statements we generate and execute should not be able
to override this. The service logic to determine and assert client
context (once per web request) is dramatically simpler than the
service logic producing all the client-level SQL statements, and it
would be nice to have a restricted execution context to isolate the
two kinds. We also like the idea that policy enforcement mechanisms
could be applied orthogonally to the generated client-level SQL
statements, as it feels like a better defense-in-depth architecture
and is also easier to reason about. To do so in our service logic
would mean replicating more and more of the query parsing and engine
to do general query rewriting.

So, it would be nice to have a tunneling mechanism where I can
distinguish the control-plane operations I am performing from the
application operations I am translating and pushing down to
PostgreSQL on behalf of the web client. In my case, I might want to
perform a mixture of service-level and client-level statements within
the same transaction. The client-level statements cannot control
transactions.

I could imagine something like a two-level feature set. At the
top-level in the connection, we can statefully manipulate our
security contexts, set configurable privilege masks for the
second-level execution context, set effective roles (reversibly, as
today), manage transactions, etc. With some kind of nested execution
block, we could submit less trusted statements to run within the
second-level execution context:

   EXECUTE RESTRICTED $guard$ app_query... $guard$ ;

This would take the guarded query string, restart a safe parser on
it, and only on successful parse go forward with planning and
executing it in the restricted mode that has been configured for the
connection.

Eventually, I could see wanting more resource management controls on
this restricted context too, i.e. setting limits of CPU/IO/RAM
consumption or execution time. Both planning time limits (reject
based on estimates) and runtime (abort query if limit is reached).

That might be an interesting feature, but it's also significantly more 
complex than the topic of implementing a safe context for secrets, 
making RLS less problematic with connection pools.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Are new connection/security features in order, given connection pooling?

2017-01-11 Thread Tomas Vondra

On 01/11/2017 03:10 PM, Stephen Frost wrote:

* Tom Lane (t...@sss.pgh.pa.us) wrote:

>
> ...

If you end up having to dedicate each connection to a particular
unprivileged userID, then you can just open the connection as that
user to start with; a magic one-way privilege switch doesn't really
help.

Well, to that I'd say "it depends." There are certainly scenarios
where you have multiple userIDs and a connection pooler like
pgbouncer which handles the different connections to the database and
it's a lot better than making new connections because new connections
are so painful and slow to create. This doesn't work great if you
have thousands of concurrently connected independent users, of
course.

We've discussed this problem repeatedly (you might want to search
the archives a bit) and never found a solution that was both fully
secure and did much of anything for connection-pooling scenarios.

Not sure which discussions you had in mind, but I found these two that 
seem relevant:

[RFC: Non-user-resettable SET SESSION AUTHORISATION]

https://www.postgresql.org/message-id/flat/CAMsr%2BYHUiukYYxtvc1UahF4yM5Jc1bZAN%2Byt86WXsSVm69XXGg%40mail.gmail.com

[A mechanism securing web applications in DBMS]
https://www.postgresql.org/message-id/CA%2B0EDdCNwJvvb3aHVT4A8ywSwO40JeHj8_CYUx2SBb9%3DR6xHew%40mail.gmail.com

>

I don't agree that this is unsolvable, but it would require things
like protocol-level changes which no one has had the gumption to work
through and propose.

Perhaps it's a mistake to make this work with roles, at least for the 
RLS use case. I'd argue roles are kinda orthogonal to the privilege 
system we have, and the fact that RLS policies may use current_user does 
not necessarily mean the solution needs to be based on roles.

Not only that roles were designed long before RLS, but having to create 
a role for each user is quite limiting, and who says role name is the 
only aspect useful for policies?

Which is why I think a protected vault-like thingy is a more promising 
approach. This is why Oracle based the VPD (Virtual Private Database, 
essentially what we call RLS) on 'application contexts', and  set by 
'trusted' procedure usually called in a LOGON trigger. That of course 
does not work with the connection pooling, but perhaps making it 
possible to re-initialize the context would be easier than protecting 
SET ROLE.

Admittedly, the solution described in the blog post is not perfect, but 
while some protocol-level support would be nice I don't think that's a 
requirement as long as the application knows how to initialize the 
context, and we reset it on RESET ALL.

>

In short, I agree with Guyren, there are features needed here that
we don't have and it would be a great deal better if we did.

Yeah.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slow index scan - Pgsql 9.2

2017-01-10 Thread Tomas Vondra


On 01/10/2017 04:05 AM, Patrick B wrote:

3,581 individual pokes into the heap to confirm tuple visibility
and apply the deleted filter - that could indeed take a while.
David J.


I see.. The deleted column is:

deleted boolean

Should I create an index for that? How could I improve this query?


Does it execute as slowly when you run it for a 2nd time?


No, it doesn't. I think it's because of cache?


I would think because of the NOT "deleted" clause. Which is
interesting, because that's a column which you conveniently didn't
include in the definition below.


My mistake.


Would an Index be sufficient to solve the problem?



Not a separate index - the query probably would not benefit from two 
separate indexes. But you can amend the existing index, to allow 
index-only scans, i.e. creating an index like this:


  CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)

This will make the index larger, but it should allow index-only scans.

The other thing you could try is partial index, i.e.

  CREATE INDEX ON (clientid) WHERE NOT is_demo AND NOT deleted;

You can also combine those approaches, but you'll have to include all 
columns into the index, even those in the index predicate:


  CREATE INDEX ON (clientid, is_demo, deleted, id, job, job_share_mode)
  WHERE NOT is_demo AND NOT deleted;

I'd bet all of those will outperform the current plan.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22

2017-01-07 Thread Tomas Vondra

On 01/08/2017 01:12 AM, Alban Hertroys wrote:

On 7 Jan 2017, at 15:44, Job <j...@colliniconsulting.it> wrote:

...

This is what it happens:

Postgres 8.4.22
Medium average load 1.5/2.0
Further queries respond very quickly

Postgres 9.6.1
Medium average load 18.0/20.0 !!
Further queries are really very slow
There is a bottle neck

I see.

This behavior is typical when a resource gets saturated. You have 
probably ran out of CPU time or I/O, resulting in growing latencies. 
Thus more processes are running (or waiting for a CPU) at the same time, 
which is what average load is based on.

What is the CPU and I/O usage in those cases?

FWIW you still haven't explained how the upgrade was performed. That 
might be a very important piece of information, because the 9.4 cluster 
might have hint bits set and/or the data may be mostly frozen, but the 
9.6 cluster may not have that yet, resulting in higher CPU usage.

By removing *only* this condition in the query function:

>>

"exists ( select 1 from gruorari where
gruorari.idgrucate=grucategorie.id and ( (('{'||gg_sett||'}')::int[] &&
array[EXTRACT(DOW FROM NOW())::int])='t' and now()::time between
gruorari.dalle::time and gruorari.alle::time) )"

Then most likely the slow-down you're experiencing is indeed in the
above subquery. It could also be the addition of the exists though,
let's not rule that out!

>

Note that I'm not on either of the versions involved (9.3.15 here),
so  I can't easily observe what you're seeing.

A general observation; I think now() calls gettimeofday() each time,

the performance of which can differ significantly depending on which
hardware clock is being used by your OS (there are often multiple
options). On the contrary, CURRENT_TIMESTAMP, CURRENT_TIME and friends
are only updated at the start of the transaction, requiring but a single
call to gettimeofday().

Judging from your queries, you don't actually seem to need the

accuracy that NOW() provides…

No. now() calls GetCurrentTransactionStartTimestamp(), so it does not 
call gettimeofday() and so the clock source overhead is pretty much 
irrelevant. Moreover it's marked as 'stable' which makes repeated calls 
unnecessary.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgres 9.6.1 big slowdown by upgrading 8.4.22

2017-01-07 Thread Tomas Vondra

On 01/07/2017 04:43 AM, Venkata B Nagothi wrote:

On Sat, Jan 7, 2017 at 2:56 AM, Job > wrote:

__
Hi guys,
really much appreciated your replies.
  >> You might want to include the query plans for each server
W e use a function, the explain analyze is quite similar:
POSTGRESQL 8.4.22:
explain analyze select 'record.com ' where
'record.com ' like '%.%' and
function_cloud_view_orari('53', '192.168.10.234', 'record.com
') != '' limit 1;
QUERY PLAN

-
  Limit (cost=0.03..0.04 rows=1 width=0) (actual time=1.488..1.488
rows=0 loops=1)
->  Result  (cost=0.03..0.04 rows=1 width=0) (actual
time=1.485..1.485 rows=0 loops=1)
  One-Time Filter:
((function_cloud_view_orari('53'::character varying,
'192.168.10.234'::character varying, 'record.com
'::character varying))::text <> ''::text)
  Total runtime: 1.531 ms
POSTGRES 9.6.1:
explain analyze select 'record.com ' where
'record.com ' like '%.%' and
function_cloud_view_orari('53', '192.168.10.234', 'record.com
') != '' limit 1;
QUERY PLAN

-
  Limit (cost=0.03..0.04 rows=1 width=32) (actual time=4.216..4.216
rows=0 loops=1)
->  Result  (cost=0.03..0.04 rows=1 width=32) (actual
time=4.215..4.215 rows=0 loops=1)
  One-Time Filter:
((function_cloud_view_orari('53'::character varying,
'192.168.10.234'::character varying, 'record.com
'::character varying))::text <> ''::text)
  Planning time: 0.046 ms
  Execution time: 4.230 ms
There is only one condition that, by deleting, Query in new 9.6.1
Postgresql Server is very fast also on massive benchmark test.
The condition is this:
"exists ( select 1 from gruorari where
gruorari.idgrucate=grucategorie.id  and (
(('{'||gg_sett||'}')::int[] && array[EXTRACT(DOW FROM
NOW())::int])='t' and  now()::time between gruorari.dalle::time and
gruorari.alle::time) )"
We have a table of "weekly events", as example:
- monday from 12 to 14
- tuesday from 18 to 20
...

As already mentioned by others, i do not see a major performance
problem (atleast based on the information you gave) due to upgrading
to 9.6.1. Do you have latest statistics in place ? What about data ?

There almost certainly *is* performance problem, despite the queries 
only take a few milliseconds. The timings show almost ~3x slowdown, and 
if the application executes a lot of those queries, it can easily mean 
3x increase in system load.

The question is what's causing it. I wonder whether this might be caused 
by different data layout, or data not being frozen on 9.6 yet, or 
something like that.

Job, can you explain how did you perform the upgrade (pg_upgrade or 
pg_dump), and how you switch back to 8.4?

All this is based on the assumption the difference is consistent, and 
not just random fluke.

>

If you can notice in the EXPLAIN output, there is a difference in
the *width*. In 9.6.1 width is 32, any idea why ?

I'd guess Result was not filling the width field in 8.4, or something 
like that. In any case, the plans are exactly the same in both versions.

regards

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance degradation when using auto_explain

2017-01-04 Thread Tomas Vondra


On 01/04/2017 08:54 PM, Kisung Kim wrote:



On Wed, Jan 4, 2017 at 1:21 AM, Andreas Kretschmer
<akretsch...@spamfence.net <mailto:akretsch...@spamfence.net>> wrote:

Kisung Kim <ks...@bitnine.net <mailto:ks...@bitnine.net>> wrote:

> And finally I found that auto_explain is the cause of the problem.

real hardware or virtual hardware? On virtual there are sometimes
problems with exact timings, please read:

https://www.postgresql.org/docs/current/static/pgtesttiming.html
<https://www.postgresql.org/docs/current/static/pgtesttiming.html>


Thank you for your reply.
I use real hardware.
I am curious timing functions have contentions when multi-threads call them.



It's not so much about contention between threads/processes, but mostly 
about the cost of actually reading data from the clock source. So even 
if you run on physical hardware, the clock source may be slow. Actually, 
there may be multiple clock sources available, differing in precision 
and overhead.


See:

/sys/devices/system/clocksource/*/available_clocksource

On my machine I see 3 different sources "tsc hpet acpi_pm" and after 
running the pg_test_timing tool, linked by Andreas, and I get this for 
'tsc' clock source


Testing timing overhead for 3 seconds.
Per loop time including overhead: 29.87 nsec
Histogram of timing durations:
< usec   % of total  count
 1 97.09159   97499400
 2  2.900852913031
 4  0.00193   1936
 8  0.00089891
16  0.00140   1405
32  0.00335   3366
64  0.0  2

suggesting that 97% of calls took less than 1 usec, which is quite good. 
For comparison, using 'hpet' gives me this:


Testing timing overhead for 3 seconds.
Per loop time including overhead: 766.92 ns
Histogram of timing durations:
  < us   % of total  count
 1 27.695581083389
 2 71.284372788485
 4  0.75748  29631
 8  0.02886   1129
16  0.06578   2573
32  0.16755   6554
64  0.00033 13
   128  0.3  1
   256  0.3  1

Which is clearly much worse (it increased the per-loop cost from 30ns to 
767ns, which is ~25x more).


So check which clock source you have selected, and test how expensive 
that is. But even with a fast clock source, the additional timing 
overhead may make EXPLAIN ANALYZE considerably slower. There's not much 
we can do about it :-(


And to make matters worse, it may affect different plans differently 
(e.g. nested loop joins do many more gettimeofday calls than other join 
types, amplifying the timing costs).


But the question is whether you actually need the timing - the total 
duration + row counts are far more important in my experience, so by setting


auto_explain.log_analyze = on
auto_explain.log_timing = off

you may significantly reduce the impact on query performance, while 
retaining the most valuable information.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Index impact on update?

2017-01-04 Thread Tomas Vondra


On 01/04/2017 05:59 PM, Israel Brewster wrote:

Short version:
Do indexes impact the speed of an UPDATE, even when the indexed columns
aren't changing?



They shouldn't, as long as the updated tuple can be updated on the same 
page (8kB chunk of data). In that case we can do a HOT update for the 
row, without updating the index(es).


But as you're updating the whole table, that would require about 50% of 
all pages to be free, which is unlikely to be true. So perhaps some 
updates can proceed without touching indexes, but most can't.



Details:
I have a table containing geographical data (Latitude, longitude, and
elevation) with 406,833,705 records. The Latitude and Longitude columns
are indexed. In order to better utilize the data, I've been looking into
PostGIS, and decided I wanted to add a "Location" column with PostGIS
type "GEOGRAPHY(point)". I then tried to populate it from the existing
latitude/longitude data using the following query:

UPDATE data SET
location=ST_GeographyFromText('SRID=4326;POINT('||lng::text||'
'||lat::text||')');

I expected this update to take quite a while, since it has 406 million
rows to update, but at this point it's been over 19 hours since I
started the query, and it still hasn't completed.

I'm wondering if the presence of the indexes could be slowing things
down even though the indexed columns aren't being updated? Would I be
better off canceling the update query, dropping the indexes, and trying
again? Or is more likely that the update query is "almost" done, and it
would be better to just let it run it's course? Or is there an even
better option, such as perhaps exporting the data, adding the additional
column in a text editor, and re-importing the data with a COPY command?



As explained above, it's likely that such full-table update has to 
modify the indexes anyway, making it much more expensive. Without 
additional information it's however impossible to confirm that's what's 
causing the long update in this case - there may be other bits slowing 
it down - e.g. foreign keys checks, triggers.


CREATE TABLE AS SELECT would not pay any of those costs, of course. 
Also, if you're running with wal_level=minimal, it would not have to 
write the changes into WAL, while the regular UPDATE has to do that.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Special index for "like"-based query

2016-12-29 Thread Tomas Vondra


On 12/30/2016 12:46 AM, David G. Johnston wrote:

On Thu, Dec 29, 2016 at 4:38 PM, Tomas Vondra
<tomas.von...@2ndquadrant.com <mailto:tomas.von...@2ndquadrant.com>>wrote:

On 12/30/2016 12:33 AM, David G. Johnston wrote:

On Thu, Dec 29, 2016 at 4:21 PM, Job <j...@colliniconsulting.it
<mailto:j...@colliniconsulting.it>
<mailto:j...@colliniconsulting.it
<mailto:j...@colliniconsulting.it>>>wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the
"like" case happens?


GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html
<https://www.postgresql.org/docs/9.6/static/pgtrgm.html>

https://www.postgresql.org/docs/9.6/static/btree-gist.html
<https://www.postgresql.org/docs/9.6/static/btree-gist.html>



For prefix queries, it's also possible to use simple btree index
with varchar_pattern_ops.

https://www.postgresql.org/docs/9.6/static/indexes-opclass.html
<https://www.postgresql.org/docs/9.6/static/indexes-opclass.html>


Even knowing that this feature exists I don't know that I could have
found it within a reasonable amount of time in its present location.  A
few cross-references from elsewhere (probably at least the functions
part of the documentation) would make learning about the capability a
lot easier.



Well, it's referenced right from the "Indexes" part of the documentation 
(right at the beginning of "Index Types"):


https://www.postgresql.org/docs/9.6/static/indexes.html

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Special index for "like"-based query

2016-12-29 Thread Tomas Vondra


On 12/30/2016 12:33 AM, David G. Johnston wrote:

On Thu, Dec 29, 2016 at 4:21 PM, Job <j...@colliniconsulting.it
<mailto:j...@colliniconsulting.it>>wrote:

Hello,

in Postgresql 9.6 we have a query running on a very large table
based, in some cases, on a like statement:

... WHERE FIELD='CODE' OR FIELD LIKE 'CODE_%'

Which type of index can i create to speed to the search when the
"like" case happens?


GIST

https://www.postgresql.org/docs/9.6/static/pgtrgm.html

https://www.postgresql.org/docs/9.6/static/btree-gist.html



For prefix queries, it's also possible to use simple btree index with 
varchar_pattern_ops.


https://www.postgresql.org/docs/9.6/static/indexes-opclass.html


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] INSERT - UPDATE throughput oscillating and SSD activity after stopping the client

2016-12-03 Thread Tomas Vondra

On Fri, 2016-12-02 at 13:45 -0800, Adrian Klaver wrote:
> 
> On 12/02/2016 09:40 AM, Tom DalPozzo wrote:
> > 
> > 
> > Hi,
> > I've two tables, t1 and t2, both with one bigint id indexed field
> > and
> > one 256 char data field; t1 has always got 1 row, while t2 is
> > increasing as explained in the following.
> > 
> > My pqlib client countinously updates  one row in t1 (every time
> > targeting a different row) and inserts a new row in t2. All this in
> > blocks of 1000 update-insert per commit, in order to get better
> > performance.
> > Wal_method is fsync, fsync is on, attached my conf file.
> > I've a 3.8ghz laptop with evo SSD.
> > 
> > Performance is  measured every two executed blocks and related to
> > these
> > blocks.
> > 
> > Over the first few minutes performance is around 10Krow/s then it
> > slowly
> > drops, over next few minutes to 4Krow/s, then it slowly returns
> > high and
> > so on, like a wave.
> > I don't understand this behaviour. Is it normal? What does it
> > depend on?
> Have you looked at the Postgres log entries that cover these
> episodes?
> 
> Is there anything of interest there?
> 
In particular look at checkpoints. In the config file you've changed
checkpoint_timeout, but you haven't changed max_wal_size, so my guess
is the checkpoints happen every few minutes, and run for about 1/2 the
time (thanks for completion_target=0.5). That would be consistent with
pattern of good/bad performance.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Strange activity of prepared queries

2016-12-03 Thread Tomas Vondra

On Fri, 2016-12-02 at 07:04 -0700, pinker wrote:
> 
> Hi!
> I have pgBadger report with strange data about prepared queries I
> cannot
> interpret by myself. If anybody could help me with interpretation,
> would be
> greatly appreciated.
> In first half of the day pgBadger shows there is no prepare
> statements and
> all queries are binded - how is that possible if average duration of
> session
> is only 4m16s, not 15 hours?
You haven't told us what parameters you use, which is a crucial piece
of information. For example it's possible that most of the time the
planning is fast enough not to be logged, so it's invisible for
pgbadger.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services
-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Avoid sorting when doing an array_agg

2016-12-03 Thread Tomas Vondra

On Sat, 2016-12-03 at 13:08 -0500, Kiriakos Georgiou wrote:
> The array_agg() has nothing to do with it.  It’s the group by.
> Without knowing what you are conceptually trying to accomplish, I
> can’t say much.

It *IS* caused by array_agg(). PostgreSQL can only do HashAggregate
when everything fits into memory, and in this case has to deal with
aggregate states of unknown size, so assumes each state is 1kB IIRC.

Per the plan the group by is expected to produce ~27k groups, so needs
about 30MB for the HashAggregate.  

> On my test 9.4.10 db, a similar example does a HashAggregate, so no
> sorting (google HashAggregate vs GroupAggregate).  But still it’s an
> expensive query because of all the I/O.

The query does almost no I/O, actually. The bitmap heap scan takes only
~230ms, which is not bad considering it produces ~1M rows. The
expensive part here seems to be the sort, but I doubt it's because of
I/O because it only produces temporary files that likely stay in RAM
anyway.

So the sort is probably slow because of CPU, as it compares strings. In
some locales that may be very expensive - not sure which locale is used
in this case, as it was not mentioned. 

> If I wanted to instantly have the user ids for a specific first, last
> name and category combo, I’d maintain a summary table via an insert
> trigger on the users table.
>  

Maybe. The question is whether it'll be a net win - maintenance of the
summary table will not be for free, especially with arrays of ids.

regards

-- 
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full_page_writes on SSD?

2015-11-25 Thread Tomas Vondra


On 11/24/2015 08:14 PM, Andres Freund wrote:

On 2015-11-24 13:09:58 -0600, Kevin Grittner wrote:

On Tue, Nov 24, 2015 at 12:48 PM, Marcin Mańk <marcin.m...@gmail.com> wrote:


if SSDs have 4kB/8kB sectors, and we'd make the Postgres page
size equal to the SSD page size, do we still need
full_page_writes?


If an OS write of the PostgreSQL page size has no chance of being
partially persisted (a/k/a torn), I don't think full page writes
are needed. That seems likely to be true if pg page size matches
SSD sector size.


At the very least it also needs to match the page size used by the
OS (4KB on x86).


Right. I find this possibility (when the OS and SSD page sizes match) 
interesting, exactly because it might make the storage resilient to torn 
pages.




But be generally wary of turning of fpw's if you use replication.
Not having them often turns a asynchronously batched write workload
into one containing a lot of synchronous, single threaded, reads.
Even with SSDs that can very quickly lead to not being able to keep
up with replay anymore.



I don't immediately see why that would happen? Can you elaborate?

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] full_page_writes on SSD?

2015-11-25 Thread Tomas Vondra


On 11/24/2015 08:40 PM, John R Pierce wrote:

On 11/24/2015 10:48 AM, Marcin Mańk wrote:

I saw this:
http://blog.pgaddict.com/posts/postgresql-on-ssd-4kb-or-8kB-pages

It made me wonder: if SSDs have 4kB/8kB sectors, and we'd make the
Postgres page size equal to the SSD page size, do we still need
full_page_writes?



an SSD's actual write block is much much larger than that. they
emulate 512 or 4k sectors, but they are not actually written in
sector order, rather new writes are accumulated in a buffer on the
drive, then written out to a whole block, and a sector mapping table
is maintained by the drive.


I don't see how that's related to full_page_writes?

It's true that SSDs optimize the writes in various ways, generally along 
the lines you described, because they do work with "erase 
blocks"(generally 256kB - 1MB chunks) and such.


But the internal structure of SSD has very little to do with FPW because 
what matters is whether the on-drive write cache is volatile or not (SSD 
can't really work without it).


What matters (when it comes to resiliency to torn pages) is the page 
size at the OS level, because that's what's being handed over to the SSD.


Of course, there might be other benefits of further lowering page sizes 
at the OS/database level (and AFAIK there are SSD drives that use pages 
smaller than 4kB).


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] RAID and SSD configuration question

2015-10-20 Thread Tomas Vondra


Hi,

On 10/20/2015 04:33 PM, Scott Marlowe wrote:


We're running LSI MegaRAIDs at work with 10 SSD RAID-5 arrays, and we
can get ~5k to 7k tps on a -s 1 pgbench with the write cache on.

When we turn the write cache off, we get 15k to 20k tps. This is on
a 120GB pgbench db that fits in memory, so it's all writes.


I'm not really surprised that the performance increased so much, as the 
SSDs have large amounts of DRAM on them - with 10 devices it may easily 
be 10GB (compared to 1 or 2GB, which is common on RAID controllers). So 
the write cache on the controller may be a bottleneck.


But the question is how disabling the write cache (on the controller) 
affects reliability of the whole RAID array.


The write cache is there not only because it improves performance, but 
also because it protects against some failure modes - you're mentioned 
RAID-5 which is vulnerable to "write hole" problem.



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] RAID and SSD configuration question

2015-10-20 Thread Tomas Vondra


Hi,

On 10/20/2015 03:30 PM, Merlin Moncure wrote:

On Tue, Oct 20, 2015 at 3:14 AM, Birta Levente <blevi.li...@gmail.com> wrote:

Hi

I have a supermicro SYS-1028R-MCTR, LSI3108 integrated with SuperCap module
(BTR-TFM8G-LSICVM02)
- 2x300GB 10k spin drive, as raid 1 (OS)
- 2x300GB 15k spin drive, as raid 1 (for xlog)
- 2x200GB Intel DC S3710 SSD (for DB), as raid 1

So how is better for the SSDs: mdraid or controller's raid?


I personally always prefer mdraid if given a choice, especially when
you have a dedicated boot drive.  It's better in DR scenarios and for
hardware migrations.  Personally I find dedicated RAID controllers to
be baroque.  Flash SSDs (at least the good ones) are basically big
RAID 0s with their own dedicated cache, supercap, and controller
optimized to the underlying storage peculiarities.


I don't know - I've always treated mdraid with a bit of suspicion as it 
does not have any "global" write cache, which might be allowing failure 
modes akin to the RAID5 write hole (similar issues exist for non-parity 
RAID levels like RAID-1 or RAID-10).


I don't think the write cache on the devices prevents this, as it does 
not prevent problems with interruption between writes the two drives.





What's the difference between Write Back and Always Write Back
withsupercap module?


No clue. With spinning drives simple performance tests would make
the caching behavior obvious but with SSD that's not always the case.
I'm guessing(!) 'Always Write Back' allows the controller to buffer
writes beyond what the devices do.


AFAIK there's no difference. It's an option that disables write cache in 
case the battery on BBU dies for some reason (so the write cache would 
become volatile). With capacitors this is not really applicable.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Tomas Vondra




On 09/30/2015 07:33 PM, Benjamin Smith wrote:

On Wednesday, September 30, 2015 02:22:31 PM Tomas Vondra wrote:

I think this really depends on the workload - if you have a lot of
random writes, CoW filesystems will perform significantly worse than
e.g. EXT4 or XFS, even on SSD.


I'd be curious about the information you have that leads you to this
conclusion. As with many (most?) "rules of thumb", the devil is
quiteoften the details.


A lot of testing done recently, and also experience with other CoW 
filesystems (e.g. BTRFS explicitly warns about workloads with a lot of 
random writes).



We've been running both on ZFS/CentOS 6 with excellent results, and
are considering putting the two together. In particular, the CoW
nature (and subsequent fragmentation/thrashing) of ZFS becomes
largely irrelevant on SSDs; the very act of wear leveling on an SSD
is itself a form of intentional thrashing that doesn't affect
performance since SSDs have no meaningful seek time.


I don't think that's entirely true. Sure, SSD drives handle random I/O
much better than rotational storage, but it's not entirely free and
sequential I/O is still measurably faster.

It's true that the drives do internal wear leveling, but it probably
uses tricks that are impossible to do at the filesystem level (which is
oblivious to internal details of the SSD). CoW also increases the amount
of blocks that need to be reclaimed.

In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x
faster than ZFS. But of course, if the ZFS features are interesting
for you, maybe it's a reasonable price.


Again, the details would be highly interesting to me. What memory
optimization was done? Status of snapshots? Was the pool RAIDZ or
mirrored vdevs? How many vdevs? Was compression enabled? What ZFS
release was this? Was this on Linux,Free/Open/Net BSD, Solaris, or
something else?


I'm not sure what you mean by "memory optimization" so the answer is 
probably "no".


FWIW I don't have much experience with ZFS in production, all I have is 
data from benchmarks I've recently done exactly with the goal to educate 
myself on the differences of current filesystems.


The tests were done on Linux, with kernel 4.0.4 / zfs 0.6.4. So fairly 
recent versions, IMHO.


My goal was to test the file systems under the same conditions and used 
a single device (Intel S3700 SSD). I'm aware that this is not a perfect 
test and ZFS offers interesting options (e.g. moving ZIL to a separate 
device). I plan to benchmark some additional configurations with more 
devices and such.




A 2x performance difference is almost inconsequential in my
experience, where growth is exponential. 2x performance change
generally means 1 to 2 years of advancement or deferment against the
progression of hardware; our current, relatively beefy DB servers
are already older than that, and have an anticipated life cycle of at
leastanother couple years.


I'm not sure I understand what you suggest here. What I'm saying is that 
when I do a stress test on the same hardware, I do get ~2x the 
throughput with EXT4/XFS, compared to ZFS.



// Our situation // Lots of RAM for the workload: 128 GB of ECC RAM
with an on-disk DB size of ~ 150 GB. Pretty much, everything runs
straight out of RAM cache, with only writes hitting disk. Smart
reports 4/96 read/write ratio.


So your active set fits into RAM? I'd guess all your writes are then WAL 
+ checkpoints, which probably makes them rather sequential.


If that's the case, CoW filesystems may perform quite well - I was 
mostly referring to workloads with a lot of random writes to he device.



Query load: Constant, heavy writes and heavy use of temp tables in
order to assemble very complex queries. Pretty much the "worst case"
mix of reads and writes, average daily peak of about 200-250

> queries/second.

I'm not sure how much random I/O that actually translates to. According 
to the numbers I've posted to this thread few hours ago, a tuned ZFS on 
a single SSD device handles ~2.5k tps (with dataset ~2x the RAM). But 
those are OLTP queries - your queries may write much more data. OTOH it 
really does not matter that much if your active set fits into RAM, 
because then it's mostly about writing to ZIL.




16 Core XEON servers, 32 HT "cores".

SAS 3 Gbps

CentOS 6 is our O/S of choice.

Currently, we're running Intel 710 SSDs in a software RAID1 without
trim enabled and generally happy with the reliability and performance
we see. We're planning to upgrade storage soon (since we're over 50%
utilization) and in the process, bring the magic goodness of
snapshots/clones from ZFS.


I presume by "software RAID1" you mean "mirrored vdev zpool", correct?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Tomas Vondra




On 09/30/2015 03:45 PM, Patric Bechtel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Tomas,

Tomas Vondra schrieb am 30.09.2015 um 14:01:

Hi,


...


I've also done a few runs with compression, but that reduces the performance a 
bit
(understandably).


I'm somewhat surprised by the influence of the rsize value. I will recheck 
that. In my case, the
compression actually improved throughput quite a bit, but that might change 
depending on CPU speed
vs IO speed. Our CPU's are quite powerful, but the SSD are just SATA 
Samsung/OCZ models at least
18 months old. Also, I measured the write performance over several hours, to 
push the internal gc
of the SSD to its limits. We had some problems in the past with (e.g. Intel) 
SSD's and their
behaviour (<1MB/s), so that's why I put some emphasis on that.


I think it really depends on how random the workload. If the workload is 
random (as for example the workload simulated by pgbench), the 
recordsize seems to matter a lot - perhaps your workload is not as 
random? Same for compression.


I'm no ZFS expert, but I assume ARC tracks those records chunks, so 
using 128kB records (default) means you have 16x less slots than with 
8kB records. You may still cache the same amount of data, but the cache 
may not adapt that well to your active set.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Tomas Vondra


Hi,

On 09/30/2015 12:21 AM, Patric Bechtel wrote:

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

Hi Benjamin,

if you're using compression, forget about that. You need to
synchronize the ashift value to the internal rowsize of you SSD,
that's it. Make sure your SSD doesn't lie to you regarding writing
blocks and their respective order. In that case you might even choose
to set sync=disabled. Also, set atime=off and relatime=on. For faster
snapshot transfers, you might like to set the checksum algo to
SHA256.


What is "SSD rowsize". Do you mean size of the internal pages?

FWIW I've been doing extensive benchmarking of ZFS (on Linux), including 
tests of different ashift values, and I see pretty much no difference 
between ashift=12 and ashift=13 (4k vs 8k).


To show some numbers, these are pgbench results with 16 clients:

  type  scaleashift=12   ashift=13  rsize=8k   logbias
  
  rosmall53097   53159 53696 53221
  romedium   42869   43112 47039 46952
  rolarge 31273108 27736 28027
  rwsmall 65936301  6384  6753
  rwmedium19021890  4639  5034
  rwlarge  561 554  2168  2585

small=150MB, medium=2GB, large=16GB (on a machine with 8GB of RAM)

The tests are "adding" the features, i.e. the columns are actually:

* ashift=12
* ashift=13
* ashift=13 + recordsize=8kB
* ashift=13 + recordsize=8kB + logbias=throughput

I've also done a few runs with compression, but that reduces the 
performance a bit (understandably).




As always, put zfs.conf into /etc/modprobe.d with

options spl spl_kmem_cache_slab_limit=16384
options zfs zfs_arc_max=8589934592

you might want to adjust the zfs_arc_max value to your liking. Don't
set it to more than 1/3 ofyour RAM, just saying.


Why? My understanding is that ARC cache is ~ page cache, although 
implemented differently and not as tightly integrated with the kernel, 
but it should release the memory when needed and such. Perhaps not 
letting it to use all the RAM is a good idea, but 1/3 seems a bit too 
aggressive?



regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.4 and ZFS?

2015-09-30 Thread Tomas Vondra


Hi,

On 09/29/2015 07:01 PM, Benjamin Smith wrote:

Does anybody here have any recommendations for using PostgreSQL 9.4
(latest)with ZFS?


I think this really depends on the workload - if you have a lot of 
random writes, CoW filesystems will perform significantly worse than 
e.g. EXT4 or XFS, even on SSD.




We've been running both on ZFS/CentOS 6 with excellent results, and
are considering putting the two together. In particular, the CoW
nature (and subsequent fragmentation/thrashing) of ZFS becomes
largely irrelevant on SSDs; the very act of wear leveling on an SSD
is itself a form of intentional thrashing that doesn't affect
performance since SSDs have no meaningful seek time.


I don't think that's entirely true. Sure, SSD drives handle random I/O 
much better than rotational storage, but it's not entirely free and 
sequential I/O is still measurably faster.


It's true that the drives do internal wear leveling, but it probably 
uses tricks that are impossible to do at the filesystem level (which is 
oblivious to internal details of the SSD). CoW also increases the amount 
of blocks that need to be reclaimed.


In the benchmarks I've recently done on SSD, EXT4 / XFS are ~2x faster 
than ZFS. But of course, if the ZFS features are interesting for you, 
maybe it's a reasonable price.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Forced external sort?

2015-09-13 Thread Tomas Vondra

On 09/13/2015 01:53 AM, Jeff Janes wrote:

On Fri, Sep 11, 2015 at 11:45 AM, <r...@winmutt.com
<mailto:r...@winmutt.com>> wrote:

I've got a poorly indexed query and was attempting a quick work
around in production by increasing work_mem when it was called.
EXPLAIN ANALYZE is telling me this:

Sort Method: external sort Disk: 1253824kB

So I set the work_mem to 2gb, still going to disk. I read Tom's
suggestion here
(http://www.postgresql.org/message-id/1936.1298394...@sss.pgh.pa.us)
and went all the way up to 92G or work_mem.

Did more googling and found this reference

(http://www.postgresql.org/message-id/CAMkU=1w2y87njueqwn8-hk2kdb4uoihfajxpo1nz3ekhzvb...@mail.gmail.com)
to a 1G sort limit that is going to be removed in 9.3.

Am I possibly experiencing the same issue even though I am on 9.3.1?
Was it really not fixed until 9.4?
(http://www.postgresql.org/message-id/557c7213.8000...@joeconway.com)

Only the rounding issue was removed in 9.3 so that it could use the full
1GB, (rather than getting to 512MB plus a few bytes, and then deciding
it didn't have room to double). The 1G limit itself was not removed
until 9.4.

Note that these limits were not on the total amount of data being
sorted, but on the size of the array of row headers, and so limits the
number of rows, regardless of the size of the rows.

Additional thing to consider is that the two sort methods (in-memory and
on-disk) use different representations of the data, and the on-disk is
much more compact. It's not uncommon to see 1:3 ratio, i.e. when
external (on-disk) sort needs 100MB, the in-memory sort would need 300MB.

So when the on-disk sort needs 1253824kB, you'll probably need ~4GB
work_mem to actually do that in memory.

regards

--
Tomas Vondra http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training & Services

--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Turn off streaming replication - leaving Master running

2015-06-29 Thread Tomas Vondra


Hi,

On 06/29/2015 09:27 AM, Andy Erskine wrote:

Thanks Jeff,

I don't want a clone - i want to temporaily turn off replication
(and therefore failover) and load a different db into the secondary
which is now writable and run some tests. Then i will remove this db
and run a basebackup to reinstate a copy of the master and turn on
replication  again.


So you want replica and you don't want replica at the same time?

Is there any reason why you simply don't want to either create a new 
snapshot using pg_basebackup, or just simply shut down the standby, 
create a copy of the data directory, remove the recovery conf and start 
it again as a standalone database?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] pg_start_backup does not actually allow for consistent, file-level backup

2015-06-08 Thread Tomas Vondra


On 06/08/15 14:45, otheus uibk wrote:

The manual and in this mailing list, the claim is made that consistent,
file-level backups may be made by bracketing the file-copy operation
with the postgresql pg_start_backup and pg_stop_backup operations.  Many
people including myself have found that in some circumstances, using
tar to copy these files will result in an error if one of the data
files changes during the tar operation. The responses to those queries
on this mailing list are unsatisfactory (everything is fine, trust us).


I don't really see what you find unsatisfactory on those responses? 
While performing the backup, the database is running and either 
modifying or even deleting the files. Tar does not expect that, and thus 
emits warnings/errors. That's clearly explained in the docs, and 
actually linked in one of the comments you quoted. And pg_start_backup 
takes care of that.


If you don't like that, you have multiple options - stop the database 
while performing the backup, perform file system level backup (e.g. lvm 
snapshot) or use tools like pg_basebackup.


regards

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] advocating LTS release and feature-train release cycles

2015-06-02 Thread Tomas Vondra




On 06/02/15 04:27, Adrian Klaver wrote:

On 06/01/2015 07:11 PM, Arthur Silva wrote:

In my opinion, FWIW, that really does not change anything. Whether
you are dealing with 20 new features over a year or 10 over half a
year the same constraints apply, writing the code and getting it
reviewed over a given time period. Add in the extra overhead costs of
more frequent releases and I see no gain.


I disagree. The fact that we have 1 release per year means there's one 
deadline, and if you miss it you have to wait another year for the 
feature to be available in official release. That's a lot of pressure 
and frustration for developers. With more frequent releases, this issue 
gets less serious. Of course, it's not a silver bullet (e.g. does not 
change review capacity).



Maybe this should be reposted to the hackers list?


Yes. And there already are threads dealing with this topic.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Help me recovery databases.

2015-05-31 Thread Tomas Vondra

base is where all the data files are located, so the answer is most 
likely 'no'.


On 05/31/15 15:11, Evi-M wrote:

Good day, Anyone.
I lost folders with /base
pg_xlog and pg_clog mount another hard disk.(500gb)
This is Postgresql 9.1, Ubuntu 12.04
Could i restore databases without /base?
I have archive_status folder.
--
С Уважением,Генералов Юрий


--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Planner cost adjustments

2015-05-29 Thread Tomas Vondra


Hi,

On 05/29/15 17:22, Melvin Davidson wrote:

I have found that setting

enable_seqscan = off

will remedy that situation. Basically, it forces the planner to
choose the index. However, if no correct index is available, it will
use sequential scan anyway. The only time it will have a negative
effect is if the seqscan is actually faster, which doesn't happen
very often.


Actually, this is quite poor choice, exactly because it forces the 
planner to use indexscan even if seqscan would be much faster. The 
correct way to fix this is by tweaking cost variables, for example by 
setting random_page_cost = 2 (instead of default 4).


Sadly, the original post provides very little information about the 
context, e.g. database size, storage system, explain plans etc.


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Planner cost adjustments

2015-05-29 Thread Tomas Vondra


Hi,

On 05/29/15 22:56, Daniel Begin wrote:

Omg! I was not expecting such a step-by-step procedure, thanks!
I'll follow the guide :-)

Since I was about to provide a bit of context as asked by Tomas, here it is
for those who are interested...
Best regards,
Daniel

A bit of the required context...
I am running all this on my personal PC:  Windows 64b, i7 chip, 16GB ram.
The PostgreSQL 9.3 cluster is spread over 3X3TB external drives with write
caching. Most tables are static (no insert).

My largest table looks like this...
Records composed of:  3 bigint, 2 boolean, 1 timestamp and 1 geography type.
Number of records: 387013
Table size: 369GB
Indexes size: 425GB
  - btree(primary key): 125GB
  - btree(another field): 86GB
  - gist(geography): 241GB



Huh, I haven't really expected that. Especially on a Windows laptop with 
external drives (I assume 7.2k SATA drives connected using USB or maybe 
eSATA?). Write cache is the on-drive write cache? Not really a good idea 
to leave that enabled (volatile cache, so a risk of data loss or data 
corruption).


Also, what do you mean by spread over? Are you using tablespaces or 
some sort of RAID?


 Overall, 40% of my table and 30% of indexes do not fit in cache
 (effective_cache_size=10GB) but looking at mostly used tables and
 indexes, more than 90% of what I use doesn't fit.

I don't really understand how you compute the 40% and 30%? You have 
~800GB of data+indexes, and only 16GB of RAM, so that's more like 2% of 
the database size. Or do you measure the hit ratios somehow?



On one hand, according to the documentation
(http://www.postgresql.org/docs/9.3/static/runtime-config-query.html),
with a cache rate like mine, I should probably increase random_page_cost to
better reflect the true cost of random storage reads.


I don't follow. Haven't you said in the first post that the database 
often chooses sequential scans while index scans are way faster? 
Increasing random_page_cost will only push if further towards sequential 
scans, making it worse.



On the other hand however, I found that...
(https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server)
This is not where you should start to search for plan problems.
Thet random_page_cost is pretty far down this list (at the end in
fact). If you are getting bad plans, this shouldn't be the first
thing you look at, even though lowering this value may be effective.
Instead, you should start by making sure autovacuum is working
properly, that you are collecting enough statistics, and that you
have correctly sized the memory parameters for your server--all the
things gone over above. After you've done all those much more
important things, ifyou're still getting bad plans then
you should see if lowering random_page_cost is still useful.


Well, so maybe you're at the point when tuning random_page_cost is the 
right next step ... but sadly you haven't provided any example queries, 
so it's hard to say. Can you choose a few queries and run EXPLAIN 
ANALYZE on them (and post it to explain.depesz.com, and only put the 
links here)?




Please find below some the database config's parameters that might
be of interest...



Best regards,
Daniel

General config parameters I have modified
temp_buffers = 512MB


Why are you tuning temp_buffers? Shouldn't you tune shared_buffers 
instead? I'm not very familiar with Windows, and I vaguely remember 
issues with larger shared_buffers values, but AFAIK that improved in the 
recent releases.



work_mem = 16MB
maintenance_work_mem = 256MB
checkpoint_segments = 64
checkpoint_completion_target = 0.8
effective_cache_size = 10GB
logging_collector = on
track_counts = on
autovacuum = on


Otherwise, I don't see anything terribly misconfigured.

regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Block Corruption issue..

2015-05-08 Thread Tomas Vondra


Hi,

On 05/07/15 05:49, Sachin Srivastava wrote:

Dear Concner,

Could you help us solving  the below error which we are getting during
taking pg_dump.

pg_dump: SQL command failed

pg_dump: Error message from server: ERROR:  invalid page header in block
14521215 of relation pg_tblspc/18140340/PG_9.1_201105231/18140346/18140757

pg_dump: The command was: COPY evidence.partline (feature_id, version,
base_type_id, domain_class_id, domain_subclass_id, status_id, deleted,
modification_time, has_children, is_evidence, data_immediate, data_lazy,
parent_line_id, start_point_id, end_point_id, start_offset, end_offset,
geometry) TO stdout;


Well, it seems the data file got corrupted, for some reason. There's a 
number of questions that need to be answered first:


1) ISTM you're running PostgreSQL 9.1 - correct? Which version exactly?
   What operating system?

2) Which relation uses that particular filenode? Look for pg_class rows
   with relfilenode 18140757.

3) Can you reconstruct the relation from other sources?

4) How did the corruption happen? Did the machine crash in the past, or
   something like that?

5) How many cases of corruption are there? Is that a single corrupted
   block, or are there more? I'd assume this is not the only corrupted
   block, and other blocks may be corrupted in different ways.

   You may use for example pageinspect to check that - for example
   use page_header / heap_page_items to walk the pages, and take note
   of pages that fail. It's not 100% solution, but it might give you a
   better idea of how much corruption you're facing.

   The easiest way to get rid of the error is to zero the page(s) that
   cause the pg_dump errors. That page will then be considered empty,
   so you'll loose that data (and you'll have to reconstruct that
   somehow - from a previous backup. for example).



Regards,

Sachin Srivastava




DISCLAIMER:

This email may contain confidential information and is intended only for
the use of the specific individual(s) to which it is addressed. If you
are not the intended recipient of this email, you are hereby notified
that any unauthorized use, dissemination or copying of this email or the
information contained in it or attached to it is strictly prohibited. If
you received this message in error, please immediately notify the sender
at Cyient and delete the original message.


Sure ...

--
Tomas Vondra   http://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: R: [GENERAL] DB on mSATA SSD

2015-04-23 Thread Tomas Vondra




On 04/23/15 15:36, Job wrote:

Hello, thank you first of all for your wonder help!

Tomas, regarding:


There are ways to make the writes less frequent, both at the database
and OS level. We don't know what's your PostgreSQL config, but making
the checkpoints less frequent and tuning the kernel/mount options may
help a lot.


We can raise up checkpoints, at kernel-side which options do you
suggest to tune for minimize disk writing?


You may make the pdflush configuration less aggressive, but that may not 
help with your workload. You should use TRIM (of fstrim regularly), move 
/tmp into a tmpfs and don't put swap on the SSD.



We have a table, about 500Mb, that is updated and written every day.
When machines updates, table is truncated and then re-populated with
pg_bulk.  But i think we strongly writes when importing new data tables..


In that case the checkpoint optimizations or kernel tuning probably 
won't help much. But if you can easily recreate the database, and it 
fits into RAM, then you can just place it into a tmpfs.



Here is why we tought putting some tables into ramdrive...


Well, technically you can do that, but don't expect the database to work 
after a crash or a reboot. You might keep a snapshot of the database 
(e.g. using pg_basebackup), and use it to 'seed' the database after a 
server restart.


But don't expect the database to start without a tablespace that got 
lost because of being placed in a tmpfs or something.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] BDR Across Distributed Nodes

2015-04-23 Thread Tomas Vondra


Hi!

On 04/23/15 20:42, billythebomber wrote:

I'm starting to test BDR, and I've followed the quickstart included in
the documentation successfully.

The problem I'm encountering is when two servers are on different hosts,
which is not covered in the documentation. Node1 is 10.0.0.1, node2 is
10.0.0.2, but when I try to connect from node2:

demo=# SELECT bdr.bdr_group_join(
   local_node_name := 'node2',
   node_external_dsn := 'port=5432 dbname=demo',
   join_using_dsn := 'port=5432 dbname=demo host=10.0.0.1'

ERROR:  node identity for node_external_dsn does not match current node
when connecting back via remote
DETAIL: The dsn '' connects to a node with identity
(6140654556124456820,1,16385) but the local node is
(6140654709151998583,1,16385)
HINT:  The 'node_external_dsn' parameter must refer to the node you're
running this function from, from the perspective of the node pointed to
by join_using_dsn


The problem apparently is that 'node2' connects to 'node1', using the 
join_using_dsn and passes it the node_external_dsn. node1 takes that, 
and attempts to connect to that. Sadly, the DSN does not contain any 
host, so node1 uses localhost and connects to itself. And finds that the 
identity does not match the expected one (because it expects *583, 
assigned to node2, but gets *820, which is probably node1).




If I add node2 host entry:

demo=# SELECT bdr.bdr_group_join(
   local_node_name := 'node2',
   node_external_dsn := 'port=5432 dbname=demo host=10.0.0.2',
   join_using_dsn := 'port=5432 dbname=demo host=10.0.0.1'
);
FATAL:  could not connect to the server in non-replication mode: could
not connect to server: Connection refused
 Is the server running on host 10.0.0.2 and accepting
 TCP/IP connections on port 5432?


This seems correct, but apparantly node1 can't connect to node2, using 
the external_dsn. Try connecting using psql from 10.0.0.1 to 10.0.0.2 
and then from 10.0.0.2 to 10.0.0.1. So something like this:


# from 10.0.0.1
psql -h 10.0.0.2 -p 5432 demo

# from 10.0.0.2
psql -h 10.0.0.1 -p 5432 demo

If that does not work, you probably need to investigate - firewall 
settings, pg_hba.conf and postgresql.conf (maybe it's not listening on 
this address)?


regards

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] DB on mSATA SSD

2015-04-23 Thread Tomas Vondra


Hi,


On 04/23/15 14:33, John McKown wrote:


That's a really old release. But I finally found some doc on it. And
8.4 does appear to have TABLESPACEs in it.

http://www.postgresql.org/docs/8.4/static/manage-ag-tablespaces.html

quote

To define a tablespace, use the CREATE TABLESPACE
http://www.postgresql.org/docs/8.4/static/sql-createtablespace.html command,
for example::

CREATE TABLESPACE fastspace LOCATION '/mnt/sda1/postgresql/data';

The location must be an existing, empty directory that is owned by the
PostgreSQL system user. All objects subsequently created within the
tablespace will be stored in files underneath this directory.


I think you should read this:

http://thebuild.com/blog/2013/03/10/you-cannot-recover-from-the-loss-of-a-tablespace/

in other words, using tablespaces for placing some of the data into a 
RAM filesystem (which may disappear) is a rather terrible idea. In case 
of crash you won't be able to even start the database, because it will 
try to recover the tablespace.



--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] DB on mSATA SSD

2015-04-23 Thread Tomas Vondra


On 04/23/15 13:07, Job wrote:

Dear Postgresql mailing list,

we use Postgresql 8.4.x on our Linux firewall distribution.
Actually, we are moving from standard SATA disk to mSATA SSD solid
drive, and we noticed that the DB, using lots of indexes, is writing a lot.

In some monthes, two test machine got SSD broken, and we are
studyinghow to reduce write impact for DB.

Are there some suggestions with SSD drives?


There are ways to make the writes less frequent, both at the database 
and OS level. We don't know what's your PostgreSQL config, but making 
the checkpoints less frequent and tuning the kernel/mount options may 
help a lot.



Putting the DB into RAM and backing up periodically to disk is a
validsolutions?


Well, that depends on your requirements. You may lose the changes since 
the last backup.




Or, is storing indexes on a ram drive possible?


No, not really.

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] DB on mSATA SSD

2015-04-23 Thread Tomas Vondra


Hi

On 04/23/15 14:50, Chris Mair wrote:

Dear Postgresql mailing list,

we use Postgresql 8.4.x on our Linux firewall distribution.
Actually, we are moving from standard SATA disk to mSATA SSD solid

drive, and we noticed that the DB, using lots of indexes, is writing a lot.


In some monthes, two test machine got SSD broken, and we are
studyinghow to reduce write impact for DB.

Are there some suggestions with SSD drives?
Putting the DB into RAM and backing up periodically to disk is a
valid solutions?

Or, is storing indexes on a ram drive possible?

Thank you in advance for your appreciated interest!

Best regards,
Francesco


Hi,

I don't think that today's SSDs - and certainly not the server-grade
ones -  will break due to write intensive loads.


Exactly. If you want an SSD for a write-intensive database, you need a 
reasonably good SSD. Sadly, the OP mentioned they're going for a mSATA 
drive, and those suck when used for this purpose.


Theoretically it's possible to improve the lifetime by only allocating 
part of the SSD and leaving some additional free space for the wear 
leveling - the manufacturer already does that, but allocates a small 
amount of space for the cheaper SSDs (say ~10% while the server-grade 
SSDs may have ~25% of unallocated space for this purpose).


So by allocating only 75% for a filesystem, it may last longer.


Have a look at the SMART data for you drives, there should be some
metrics called wear level count or similar that gives some
indications. I wouldn't be surprised if you find that your broken
drives had failures  not related to wear level.


My experience with mSATA drives is rather bad - the SMART data is rather 
unreliable, and most of them doesn't even have power-loss protection 
(which you need for a database, although a UPS may help a bit here). But 
maybe that changed recently.




If you're on Linux use smartctl.

Also, as others have pointed out 8.4 is out of support, so consider
upgrading.


+1 to this

--
Tomas Vondra  http://www.2ndQuadrant.com
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Slow query with join

2015-03-16 Thread Tomas Vondra

On 16.3.2015 19:50, Marc Watson wrote:
 Hello all,
 I am using PostgreSQL 9.4.1, compiled by Visual C++ build 1800,
64-bit, as downloaded from EnterpriseDB, and is running on my dev system
under Win 7 64-bit.
 I hope someone can help me with a problem I'm having when joining a
view with a table. The view is somewhat involved, but I can provide the
details if necessary
 A query on the table is quick (16 ms):

 explain analyze select ir_actor_id from f_intervenant_ref where ir_dos_id = 
 '5226' order by ir_actor_id;
 
 Sort  (cost=17.28..17.30 rows=8 width=4) (actual time=0.032..0.033 rows=8 
 loops=1)
   Sort Key: ir_actor_id
   Sort Method: quicksort  Memory: 25kB
   -  Index Scan using ir_dos_id_idx on f_intervenant_ref  (cost=0.28..17.16 
 rows=8 width=4) (actual time=0.019..0.024 rows=8 loops=1)
 Index Cond: ((ir_dos_id)::text = '5226'::text)
 Planning time: 0.180 ms
 Execution time: 0.049 ms
 ..

ISTM the database is applying the IN() condition last, i.e. it executes

   SELECT * FROM v_actor

and then proceeds to filter the result. I'd bet if you measure time for
that (SELECT * FROM v_actor) you'll get ~7 seconds.

First, get rid of the ORDER BY clauses in the subselects - it's
completely pointless, and might prevent proper optimization (e.g.
replacing the IN() with optimized joins.

I.e. try this:

 SELECT * FROM v_actor
  WHERE v_actor.actor_id IN (SELECT ir_actor_id FROM f_intervenant_ref
  WHERE ir_dos_id = '5226');

I'd also try replacing this with EXISTS

 SELECT * FROM v_actor
  WHERE EXISTS (SELECT 1 FROM f_intervenant_ref
 WHERE (actor_id = ir_actor_id)
   AND (ir_dos_id = '5226'));

or even an explicit join

 SELECT v_actor.* FROM v_actor JOIN f_intervenant_ref
ON (actor_id = ir_actor_id)
 WHERE ir_dos_id = '5226');

That might behave a bit differently if there are multiple
f_intervenant_ref rows matching the actor. If that's the case, a simple
DISTINCT should fix that.

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-12 Thread Tomas Vondra

On 12.3.2015 04:57, Tim Uckun wrote:
 I am using postgres 9.4, the default install with brew install
 postgres, no tuning at all.  BTW if I use postgres.app application the
 benchmarks run twice as slow!

I have no idea what brew or postgres.app is. But I strongly recommend
you to do some tuning.

  https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

 
 Why do you think there is such dramatic difference between
 
 EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' SELECT
 ($1).*' USING NEW ;
 
 and
 
  EXECUTE  'INSERT INTO ' ||  quote_ident(partition_name) ||  ' VALUES(
 ($1).*)' USING NEW ;
 
 One is thirty percent faster than the other.  Also is there an even
 better way that I don't know about.

Because processing dynamic SQL commands (i.e. EXECUTE '...') is simply
more expensive, as it needs to do more stuff (on every execution). There
are reasons for that, but you may think of it as regular queries vs.
prepared statements.

Prepared statements are parsed and planned once, regular query needs to
be parsed and planned over and over again.


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Benchmarking partitioning triggers and rules

2015-03-11 Thread Tomas Vondra

On 11.3.2015 21:43, Tim Uckun wrote:
 Hey I hate to bump my post but I would really appreciate some input
 on this benchmark. I am very alarmed that adding a very simple
 partitioning trigger slows the insert speed by an order of magnitude.
 Is there any way to speed this up?

I think to actually give you a meaningful response, we really need more
details about your configuration - what PostgreSQL version are you
using, what configuration have you changed from the defaults and such.

Anyway, you're right that triggers are not cheap. The numbers I get on
the development version with a bit of tuning look like this:

INSERT (direct)1.5 sec
INSERT 4.0 sec
INSERT (EXECUTE)  11.5 sec

So it's ~ what you've measured. Rules have the lowest overhead, but also
there's a lot of tricky places.

There's not much you can do, except for inserting the data directly into
the right partition (without any triggers).


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] : :Full text search query ::

2015-02-25 Thread Tomas Vondra

Hi,

On 25.2.2015 12:50, JD wrote:
 Hi All,
 
 please find herewith the following  query
 
 1. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104')
 
 it is showing only 1 record as output, it is expected to give 17 records
 as output.
 
 2. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@ to_tsquery('104/')
 
 it is showing only 1 record as output, it is expected to give 17 records
 as output.
 
 3. select * from partdetails where scode=118 and 
 (to_tsvector('english'::regconfig, part_number::text)) @@
 to_tsquery('104/1')
 
 it is showing  17 records as output.
 
 In our search case we are passing parameter value as 104 and expected to
 get 17 records.
 
 
 Kindly some one guide here.

You need to post 'to_tsvector('english', part_number)' for the 16 rows
that you think should be returned but aren't.

Fulltext works so that it transforms the source (part_number in this
case) as defined in the text search configuration ('english'), and
compares this with the tsquery.

My bet is that the transformation keeps the whole string ('104/1')  in
this case, so that it does not match the tsquery.

ISTM you're trying to do a prefix search on the part_number. In that
case fulltext may not be the right solution, because it's fuzzy by
nature. If you have two-part part numbers (i.e. it's always A/B) then
maybe split that into two fields, and use simple equality conditions on
each field.

So instead of column 'part_number' containing valuye '104/1' use two
columns part_number_a and part_number_b, containing values '104' and
'1', and simple equality queries

WHERE part_number_a = '104' and part_number_b = '1'

or (if you want to match just the first part)

   WHERE part_number_a = '104'

Another option is to use an index with a 'varchar_pattern_ops' opclass,
which allows you to do prefix LIKE queries [1]

   CREATE INDEX custom_partnum_idx
 ON partdetails (part_number varchar_pattern_ops);


   SELECT ... FROM partdetails WHERE part_number LIKE '104/%'


[1] http://www.postgresql.org/docs/9.2/static/indexes-opclass.html

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] newbie how to access the information scheme

2015-02-24 Thread Tomas Vondra

On 24.2.2015 19:58, John McKown wrote:
 I normally do the command (in psql)
 
 \d+
 
 But if you want an actual SQL statement, the above invokes:
 
 SELECT n.nspname as Schema,
   c.relname as Name,
   CASE c.relkind WHEN 'r' THEN 'table' WHEN 'v' THEN 'view' WHEN 'm'
 THEN 'materialized view' WHEN 'i' THEN 'index' WHEN 'S' THEN
 'sequence' WHEN 's' THEN 'special' WHEN 'f' THEN 'foreign table' END
 as Type,
   pg_catalog.pg_get_userbyid(c.relowner) as Owner,
   pg_catalog.pg_size_pretty(pg_catalog.pg_table_size(c.oid)) as Size,
   pg_catalog.obj_description(c.oid, 'pg_class') as Description
 FROM pg_catalog.pg_class c
  LEFT JOIN pg_catalog.pg_namespace n ON n.oid = c.relnamespace
 WHERE c.relkind IN ('r','v','m','S','f','')
   AND n.nspname  'pg_catalog'
   AND n.nspname  'information_schema'
   AND n.nspname !~ '^pg_toast'
   AND pg_catalog.pg_table_is_visible(c.oid)
 ORDER BY 1,2;
 
 I found the above by using the command: psql -E

Or just use the  information_schema like this:

   select table_schema, table_name from information_schema.tables;

It's also possible to get mostly the same info using pg_class catalog:

   select relname from pg_class where relkind = 'r';

but that may require a bit more work, if you want schema names too for
example (as the query executed by psql illustrates).

regards


-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Fwd: Data corruption after restarting replica

2015-02-18 Thread Tomas Vondra

On 16.2.2015 11:44, Novák, Petr wrote:
 Hello,
 
 sorry for posting to second list, but as I've received  no reply
 there, I'm trying my luck here.
 
 Thanks
 Petr
 
 
 -- Forwarded message --
 From: Novák, Petr nov...@avast.com
 Date: Tue, Feb 10, 2015 at 12:49 PM
 Subject: Data corruption after restarting replica
 To: pgsql-b...@postgresql.org
 
 
 Hi all,
 
 we're experiencing data corruption after switching streamed replica 
 to primary.This is not the first time I've encountered this issue,
 so I'l try to describe it in more detail.
 
 For this particular cluster we have 6 servers in two datacenters (3
 in each). There are two instances running on each server, each with
 its own port and datadir. On the first two servers in each datacenter
 one instance is primary and the other is replica for the primary from
 the other server. Third server holds two offsite replicas from the
 other datacenter (for DR purposes)

So essentially you have three servers in each datacenter, configured
like this:

server A

instance A1 (primary)
instance A2 (standby for B1)

server B

instance B1 (primary)
instance B2 (standby for B2)

server C (in the other datacenter)
--
instance C1 (standby for A1)
instance C2 (standby for B1)

So if A or B fails, you can swich to the other machine to the replica in
th same datacenter, and if the whole datacenter fails then you have C in
the other one.

Correct?

 Each replica was set up by taking pg_basebackup from primary 
 (pg_basebackup -h hostname -p 5430 -D /data2/basebackup -P -v -U 
 user -x -c fast). Then directories from initdb were replaced with 
 the ones from basebackup (only the configuration files remained) and 
 the replica started and was successfully connected to primary. It
 was running with no problem keeping up with the primary. We were 
 experiencing some connection problem between the two datacenters,
 but replication didn't break.

OK, that sounds about right. How long was the replication running before
the failover? Have you tried running some queries on the standby?

BTW are you running async or sync replication (I guess async)?

 Then we needed to take one datacenter offline due to hardware 
 maintenance. So I've switched the applications down, verified that
 no more clients were connected to primary, then shut the primary down
 and restarted replica without recovery.conf and the application were 
 started using the new db with no problem. Other replica even 
 successfully reconnected to this new primary.

Other replica? If I understand your deployment right, and you've
switched to the other datacenter, all the related replicas (original
master + first standby) were in the other datacetenter (now offline). So
what replica reconnected to the newly promoted primary?

Restart after removing recovery.conf is a bit cumbersome because of the
restart (pg_ctl promote does not need that IIRC), but it's a legal way
to do the promote.

 Few hours from the switch lines appeared in the server log (which 
 didn't appear before), indicating a corruption:
 
 ERROR: index account_username_key contains unexpected zero page at 
 block 1112135 ERROR: right sibling's left-link doesn't match: block
 476354 links to 1062443 instead of expected 250322 in index
 account_pkey
 
 ..and many more reporting corruption in several other indexes.
 
 The issue was resolved by creating new indexes and dropping the 
 affected ones, although there were already some duplicities in the 
 data, that has to be resolved, as some of the indexes were unique.

So the machine was running for a few hours just fine, and then something
happened and these errors started occuring?

Can you check whether the table was vacuumed recently (before the errors
started to appear) or what else interesting happened?

 This particular case uses Postgres 9.1.14 on both primary and 
 replica. But I've experienced similar behavior on 9.2.9. OS Centos 
 6.6 in all cases. This may mean, that there can be something wrong 
 with our configuration or the replication setup steps, but I've set 
 up another instance using the same steps with no problem.

How different are the systems? I see you're running 6.6 on both ends,
but maybe there's some tiny difference? One common issue we see from
time to time are minor locale differences, affecting the indexes.

 Fsync related setting are at their defaults. Data directories are on 
 RAID10 arrays, with BBUs. Filesystem is ext4 mounted with nobarrier 
 option.

Good ;-)

 Database is fairly large ~120GB with several 50mil+ tables, lots of 
 indexes and FK constraints. It is mostly queried, 
 updates/inserts/deletes are only several rows/s.

So it was queried for a few hours? Any idea if it was using the indexes,
or if it was choosing different execution plans?

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing

Re: [GENERAL] How to analyze a slowdown in 9.3.5?

2015-01-09 Thread Tomas Vondra

On 9.1.2015 23:14, Michael Nolan wrote:
 I'm running 9.3.5 on a virtual machine with 5 cores and 24 GB of
 memory.  Disk is on a SAN.
 
 I have a task that runs weekly that processes possibly as many as
 120 months worth of data, one month at a time. Since moving to 9.3.5 
 (from 8.2!!) the average time for a month has been 3 minutes or less.

Congrats to migrating to a supported version!

Please, comparison of the configuration used on 8.2 and 9.3.5 would be
helpful (i.e. how you've updated the config on the new version?).

 However, when this job ran this Tuesday, it ran fine for a number of 
 months, but then started slowing down dramatically, 300 minutes for 
 one month and then 167 minutes for the next. I stopped and restarted 
 postgresql, the next block also ran really slow (157 minutes.) I
 then rebooted the server and the remaining blocks ran at the usual
 fast speed again, so restarting postgresql didn't fix the problem
 but rebooting the server did.

What amounts of data are we talking about? Gigabytes? Tens of gigabytes?

 
 Looking at the logs, I see queries with a function call that would 
 normally take no more than 100-200 milliseconds, usually far less, 
 that were taking 100 seconds or longer. This function gets called 
 thousands of times for each month, so that appears to be one source
 of the slowdown.

But why are the functions taking so much longer? Are they eating CPU,
I/O or are generally waiting for something (e.g. locks)?

 
 I don't suspect a memory leak in the calling program (in php),
 because since moving to this server in December this weekly task has
 run several times over the same range of months, making pretty much
 the same function calls each time. I also ran the entire range
 several times during testing.
 
 One change made to the server since the previous week's run was that
 I moved up to the latest Centos kernel (Linux version 
 3.10.0-123.13.2.el7.x86_64).

And what was the previous kernel version?

However, if it worked fine after rebooting the server, it may not be a
kernel issue (unless it somehow depends on uptime). Is there something
in the /var/log/messages?

 As far as I can tell, the other virtual servers weren't being slowed
 down, so I don't suspect problems with the virtual server or the SAN.
 
 If this happens again, what sorts of settings in postgresq.conf or
 other tools should I be using to try to track down what's causing
 this?

Well, we don't know what the function is doing, so it'd be nice to get
some basic description first. Is it querying the database? Is it
inserting or updating large amounts of data? Or is it something more
complex?

Aside from that, some basic system stats would be really helpful, so
that we can identify the bottleneck (is that a CPU, I/O, locking, ...).
A few lines from these commands should give us some basic idea:

  iostat -x -k 5
  vmstat -w 5

regards

-- 
Tomas Vondrahttp://www.2ndQuadrant.com/
PostgreSQL Development, 24x7 Support, Remote DBA, Training  Services


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] invalid memory alloc request size

2014-12-10 Thread Tomas Vondra

On 10.12.2014 17:07, Gabriel Sánchez Martínez wrote:
 Hi all,
 
 I am running PostgreSQL 9.3.5 on Ubuntu Server 14.04 64 bit with 64 GB
 of RAM.  When running pg_dump on a specific table, I get the following
 error:
 
 pg_dump: Dumping the contents of table x_2013 failed:
 PQgetResult() failed.
 pg_dump: Error message from server: ERROR:  invalid memory alloc request
 size 18446744073709551613
 pg_dump: The command was: COPY public.x_2013 (...) TO stdout;
 pg_dump: [parallel archiver] a worker process died unexpectedly
 
 If I run a COPY TO file from psql I get the same error.
 
 Is this an indication of corrupted data?  What steps should I take?

In my experience, issues like this are caused by a corrupted varlena
header (i.e. corruption in text/varchar/... columns).

How exactly that corruption happened is difficult to say - it might be a
faulty hardware (RAM, controller, storage), it might be a bug (e.g.
piece of memory gets overwritten by random data). Or it might be a
consequence of incorrect hardware configuration (e.g. leaving the
on-disk write cache enabled).

If you have a backup of the data, use that instead of recovering the
data from the current database - it's faster and safer.

However, it might be worth spending some time analyzing the corruption
to identify the cause, so that you can prevent it next time.

The are tools that might help you with that - pageinspect extension is
a way to look at the data files on a low-level. It may be quite tedious,
though, and it may not work with badly broken data.

Another option is pg_check - an extension I wrote a few years back. It
analyzes the data file and prints info on all corruption occurences.
It's available at https://github.com/tvondra/pg_check and I just pushed
some minor fixes to make it 9.3-compatible.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tomas Vondra

Hi,

Dne 5 Listopad 2014, 17:31, Rémy-Christophe Schermesser napsal(a):
 Hi,

 We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
 data, schema, PG configuration, and are almost identical machines, same
 number of cores and memory, but different cloud provider. The data was
 transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE, and
 REINDEX on both machines.

What do you mean by cloud provider? Are you installing and configuring
the machine on your own, or is this set-up by the provider? BTW you should
do basic benchmarking first - numbers reported by the providers are just
random numbers until you verify them.

 One query take ~11 seconds on 9.1.1 and ~25 minutes on 9.1.14.

Well, the first observation is that the queries produce different results:

Limit  (cost=100414.92..107502.31 rows=1000 width=279) (actual
time=6200.302..11650.567 rows=1000 loops=1)

Limit  (cost=20.64..73294.62 rows=1000 width=279) (actual
time=1419311.904..1419400.785 rows=3 loops=1)

So while on 9.1.1 it produces 1000 rows very quickly, on 9.1.14 it only
ever finds 3 rows (so the query needs to scan all the data, and the abort
early does not trigger).

There are other differences, though. For example on 9.1.1 the nested loop
returns ~8k rows:

Nested Loop  (cost=88.78..7785.80 rows=2655 width=279) (actual
time=190.009..9470.460 rows=7951 loops=1)

while on 9.1.14 it produces ~120k rows:

Nested Loop  (cost=20.64..8045.28 rows=2694 width=279) (actual
time=13.230..555.366 rows=121063 loops=1)

This may be one of the reasons why the database decided to use different
join method.

Are there any differences in settings between the two machines (e.g.
work_mem)?

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Performance problem on 2 PG versions on same query

2014-11-05 Thread Tomas Vondra

Dne 5 Listopad 2014, 18:10, Tom Lane napsal(a):
 Tomas Vondra t...@fuzzy.cz writes:
 Dne 5 Listopad 2014, 17:31, R??my-Christophe Schermesser napsal(a):
 We have 2 instances of PG, one in 9.1.1 and 9.1.14. They have the same
 data, schema, PG configuration, and are almost identical machines, same
 number of cores and memory, but different cloud provider. The data was
 transferred with a pg_dump/pg_restore. We ran VACUUM ANALYSE, ANALYSE,
 and
 REINDEX on both machines.

 Well, the first observation is that the queries produce different
 results:

 Yeah.  Another reason to not believe that the databases contain identical
 data is here:

-  Seq Scan on andalertsmatch am  (cost=0.00..71287.87
 rows=1064987 width=52) (actual time=0.000..1680.077 rows=1064987
 loops=1)

 -  Index Scan using andalertsmatch_a_mid_idx on andalertsmatch
 am
 (cost=0.00..180798.61 rows=1173762 width=52) (actual
 time=0.015..875294.427 rows=1826118122 loops=1)

 For some reason there's over 1000 times more rows in andalertsmatch in
 the 9.1.14 installation.  I'm betting on a foulup somewhere in the data
 dump/restore process.

I'd bet that's mostly due to rescans in the merge join ...

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] question about memory usage

2014-07-23 Thread Tomas Vondra

On 23 Červenec 2014, 15:56, klo uo wrote:
 Bill, thanks for your reply.

 shared_buffers is set to 128MB.

 Now that you mention config file, the only thing I did change there, and
 was suggested to me while I made some on my databases was
 max_locks_per_transaction = 5 (which has default value 1).

 After resetting max_locks_per_transaction to default value and
 restarting
 the server, memory occupied in working set reduced linearly to around 200
 MB.

 I guess this is it.

The default value for max_locks_per_transaction is 64, not 1. Values
this high are quite insane, and suggest that either you don't know what
the value means (and increased it just in case, because more is always
better) or that the application does something wrong (eventually
requiring so many locks).

You really need to check this (notice how the amount of shared memory
depends on max_locks_per_transaction):

http://www.postgresql.org/docs/9.0/static/kernel-resources.html#SHARED-MEMORY-PARAMETERS

and this (which explains what max_locks_per_transaction does):

http://www.databasesoup.com/2012/06/postgresqlconf-maxlockspertransaction.html

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SQL query runs fine on one platform (FreeBSD), but hangs on another (Windows)

2014-04-30 Thread Tomas Vondra

On 30 Duben 2014, 10:46, David Noel wrote:
 Very strange. I ran the query and it seemed slow so I rewrote it with
 a join instead. Using join it finished in 800ms. The query using the
 lateral finished in more than a minute. I guess I need to do some
 analysis on those queries to figure out why there was such a vast
 difference in performance. %10, %20, %50, even %100 differences in
 performance are huge, but for something to take nearly 100x -- %1
 longer to complete? Something just doesn't seem right.

That is not strange at all.

In an ideal world the database would able to understand the semantics of
the query perfectly, and rewrite it to the best plan possible (returning
the desired result). In practice that is not the case, sadly - the planner
has limited knowledge and while it can do many clever tweaks, the way you
write a query somehow limits the options. So when you use LATERAL in the
query, it may or may not be able to rewrite it to the better plan.

To really understand what's going on here we need to see the explain plans
of the queries.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] SQL query runs fine on one platform (FreeBSD), but hangs on another (Windows)

2014-04-29 Thread Tomas Vondra

On 29 Duben 2014, 8:59, David Noel wrote:
 The query I'm running is:

 select page.*, coalesce((select COUNT(*) from sentence where
 sentence.PageURL = page.URL group by page.URL), 0) as
 NoOfSentences from page WHERE Classification LIKE CASE WHEN 
 THEN  ELSE '%' END ORDER BY PublishDate DESC Offset 0 LIMIT 100

 I can post the table definitions if that would be helpful but I don't
 have them on hand at the moment.

 The gist of it though is that page and sentence are two tables.
 page.URL maps to sentence.PageURL. The page table has the columns
 Classification, and PublishDate. URL, PageURL, and Classification
 are strings. PublishDate is a timestamp with timezone.

 Both queries are run from a Java project using the latest JDBC driver.
 The PostgreSQL Server versions it's being run on are 9.2 and 9.3. The
 query executes and returns just fine when run on a FreeBSD-based
 platform, but executes forever when run under Windows.

Is both server/client running on FreeBSD or Windows, or are you switching
only part of the stack?

 Does anyone have any idea why this might be happening? Are there
 platform/syntax compatibility issues I'm triggering here that I'm
 unaware of? Is there something wrong with the query?

It shouldn't get stuck. It might be slower on some platforms, but it
shouldn't really get stuck, so it might be a bug.

On linux I'd recommend perf/strace/... to investigate the issue, but I'm
not familiar with similar tool on Windows.

Is the query eating a lot of CPU, or is it just sitting there idle, doing
nothing? Or is there some other process doing a lot of CPU (e.g. Java)?

Can you try running the query through 'psql' directly, to rule out JDBC
issues etc.? Try to collect explain plans for the query (maybe there's
something wrong with it).

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

[GENERAL] aggregate returning anyarray and 'cannot determine result data type'

2014-04-22 Thread Tomas Vondra

Hi all,

I needed to implement an aggregate producing a random sample, with an
upper bound on the number of items. I.e. not the usual 5% of values
but up to 1000 values.

So my plan was to do something like this:

  sample_append(internal, anyelement, int) - internal
  sample_final(internal) - anyarray

  CREATE AGGREGATE sample_agg(anyelement, int) (
  SFUNC = sample_append,
  STYPE = internal,
  FINALFUNC = sample_final
  );

where 'internal' represents a pointer to a structure with all the info
(limit, currently accumulated sample, ...).

However this leads to

  ERROR:  cannot determine result data type
  DETAIL:  A function returning a polymorphic type must have at least
   one polymorphic argument

because 'sample_final' produces anyarray but has no polymorphic
argument. Sadly, the 'natural' solution of using anyarray instead of the
internal structure does not work because I have no way to pass the other
parameters to the final function (the last int in sfunc).

Any ideas how to solve this?

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread Tomas Vondra

Steve, please stop top-posting, especially if others bottom-post. It turns
the messages a hard to follow mess. I took the liberty to reshuffle the
parts a bit.

On 18 Duben 2014, 6:09, Steve Spence wrote:

 On Fri, Apr 18, 2014 at 12:02 AM, Jan Wieck j...@wi3ck.info wrote:
 On 04/17/14 20:01, Steve Spence wrote:

 I'm trying to port my application from mysql to postgres (for the GIS
 supoort). I had hoped there would be wild enthsuiasm for such a
 project, as the Arduino community is extremely large, and this would
 push postgres into a wider audience. I expected it would take multiple
 people with varying backgrounds to accomplish the task. If you can do
 it on your own, great. I'll be happy to lend what knowledge I have of
 the Arduino and using the mysql connector -


 This is a pattern I've seen for over a decade now. People think that
 randomly sprinkling MySQL into a discussion will generate wild
 enthusiasm for supporting their attempt to port *their* appli ...
 wait
 ... oh ... didn't work again?

Yeah. MySQL does X is not an argument for us to do X too.

 You know what? Fine, it doesn't matter that much to me. I'm happy to
 continue using MySQL. It works with the Arduino quite nicely. Postgres
 doesn't work. That's Postgres loss not mine. I really thought the
 postgres team would be interested in providing support for a very
 popular microcontroller system. If not, they are missing out on a
 larger segment of users.

Please, enough of this Fine with me, if you're not interested! attitude.

What exactly have you expected to happen, given the (minimal) amount of
info you provided initially? Had you wanted to get us excited, you'd
post an explanation of how exciting the project is, what it does etc.

 The MySQL connector is copyrighted and GPL'd by Oracle, it would be

It's witten by a person who happens to work for Oracle and apparently is
an arduino hacker too. That's exactly what Jan Wieck explained above.
And it's also very different from Oracle decided to support arduino.

 nice if Postgres would do something similar for those that prefer
 using Postgres instead of forcing people touse MySQL. My mistake in
 thinking this might be the case. Or Maybe Jan doesn't speak for
 Postgres .

We're not forcing anyone to use anything. But we're not responsible for
supporting every use-case that is out there either, that's not how this
community works IMHO.

But actually we already did a lot of work to support even your use case.
For example we have a great documentation of how the protocol works [1],
with detailed explanation of how to send simple queries [2].

If all you need is a simple authentication (trust, plaintext, md5) and
simple queries (insert as a single string), it's pretty trivial.

[1] http://www.postgresql.org/docs/9.1/static/protocol.html
[2] http://www.postgresql.org/docs/9.1/static/protocol-flow.html#AEN91596

Feel free to ask if you run into some issues here.

regards
Tomas

PS: I do share the view that sending the data to a simple proxy (e.g. a
simple process waiting for data sent over UDP), forwarding the data
to PostgreSQL would be simpler/cleaner/...




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread Tomas Vondra

On 18 Duben 2014, 18:11, Edson Richter wrote:
 Is Arduino capable of running Java apps?
 If so, you can use the excellent high quality PgSQL JDBC driver.
 Java interfacing with native libraries is not difficult, JNI are pretty
 well know and documented (and long lived already).

 Just my 2c (if you disregard, please just ignore me :-) ),

Arduino is a tiny computer, with a just a few kBs of memory. That's hardly
sufficient for a Java environment (especially if even the libpq library is
considered too large).

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-18 Thread Tomas Vondra

On 18 Duben 2014, 17:01, Adrian Klaver wrote:
 On 04/18/2014 07:53 AM, Jan Wieck wrote:
 On 04/18/14 10:31, Steve Spence wrote:
 Not a thing in that document about the Arduino. Just how to install
 Postgres on a Raspberry Pi. My Postgres is on a hosted server at a ISP.

 You intend to have thousands of Arduino devices, incapable of doing any
 sort of encryption or other means of secure IP connections, directly
 connect to a database, that is hosted on a publicly accessible VPS?

 Maybe it is just me, but to me that design has DISASTER written in bold,
 red, 120pt font all over it.

 Jan,

 It is already established you do not like any part of this idea. Beating
 the dead horse really does not accomplish anything.

I don't think pointing out weaknesses of a proposed solution is equal to
beating a dead horse.

I see two potential issues here - security and excessive number of
connections. Security, because while you can reasonably authenticate the
client (e.g. using MD5 authentication), there's no way of encrypting the
traffic. But if the data is not sensitive, this might be sufficient.

Excessive number of connections, because if you keep one connection from
each arduino device, and you have 1000s of devices ... you get the idea.
But this might be resolved using pgbouncer + transaction pooling or so.

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-17 Thread Tomas Vondra

On 17.4.2014 16:51, Adrian Klaver wrote:
 On 04/17/2014 07:44 AM, David Rysdam wrote:
 Bruce Momjian br...@momjian.us writes:
 On Thu, Apr 17, 2014 at 09:39:55AM -0400, Steve Spence wrote:
 So, who wants to work on this with me? I'm a fair arduino programmer,
 but know nothing about postgres.

 I would look at the MySQL one as a first step to see how that was done.
 You are basically going to need to duplicate libpq, which is a major
 undertaking.

 Maybe I'm being naive, but isn't libpq already being compiled for ARM by
 Debian? As long as it fits, you should be good. If it doesn't, you'll
 need to strip some stuff out.

 
 If you get the MySQL connector from here:
 
 https://launchpad.net/mysql-arduino/+milestone/release-1.0.2-beta
 
 you will see they are doing the equivalent, using the MySQL libmysqlclient:
 
 http://dev.mysql.com/downloads/connector/c/

Really? I see no reference to libmysqlclient there. It seems they're
simply reimplementing only a (small) subset of the library.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Arduino SQL Connector

2014-04-17 Thread Tomas Vondra

On 17.4.2014 19:43, Steve Spence wrote:
 Oracle thought it was a good idea to put out a MySQL version, I
 figure there should be some effort to counter that here .

Really? I found no information about this on oracle.com or mysql.com,
except for a section in the discussion forum with ~20 posts. It's true
that the author is apparently a Senior Software Engineer at Oracle, but
this seems like a his own project. So much regarding the Oracle
thought part ...

BTW I doubt the approach Oracle does X = you should do X too will be
fruitful here.

 Need a Team lead on this, and I'll collaborate as much as I can on the
 Arduino / Networking side.

I'm a bit confused. What do you mean by team lead? I was thinking that
maybe this would be a fun project for the holidays, refreshing my
Arduino-fu. But no way I'm interested in being a TL of anything.

I was messing with the PostgreSQL protocol [1] recently, and writing a
code implementing something like what the mysql connector does would be
quite trivial. A day of work, maybe two if you know networking but not
the protocol.

regards
Tomas

[1] http://www.postgresql.org/docs/9.3/interactive/protocol.html


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] encrypting data stored in PostgreSQL

2014-04-09 Thread Tomas Vondra

On 9.4.2014 23:28, Martijn van Oosterhout wrote:
 On Wed, Apr 09, 2014 at 02:16:34PM -0700, Ken Tanzer wrote:
 Any thoughts on how to pull this off for PostgreSQL stored data?

 I looked at this a while ago because I have clients who might
 require this in the future. ISTM you should be able to have your PG
 data directory stored on an encrypted filesystem. I believe this
 will decrease performance, but I have no idea by how much.
 
 FWIW, I have several databases running on encrypted filesystems. The 
 performance difference is negligable *if* you have hardware 
 acceleration for your encryption, which most modern processors have.
 
 Essentially, the processor can encrypt/decrypt data so much faster
 than the cost of reading/writing to disk, you don't notice the
 difference. There's surely a difference, but if this means you meet
 your requirements it's an excellent solution.

We're running a number of rather busy PostgreSQL boxes with encryption
at filesystem (or more precisely dm-crypt/LUKS with LVM, IIRC).

Support for encryption acceleration (AES-NI [1]) is an absolute must.

The other thing that is essential for good performance is reasonably
recent kernel. 2.6.x kernels have a single-threaded kcryptd, which means
you can't get more than ~150 MB/s AES-256 (per partition). With other
algorithms it's not much better (say, 170MB/s with AES-128, IIRC).

Somewhere in 3.x (or maybe very late 2.6.x) kcryptd was improved to use
multiple threads - that's a significant improvement, both for throughput
and latencies.

Clearly, it's going to eat (part of) your CPUs, but that's expected. The
encryption still has impact on latencies, but with the multi-threaded
kcryptd it's pretty-much negligible.


regards
Tomas

[1] http://en.wikipedia.org/wiki/AES_instruction_set


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Linux OOM-Killer

2014-03-17 Thread Tomas Vondra

Hi,

On 17 Březen 2014, 11:45, basti wrote:
 Hello,

 we have a database master Version:
 PostgreSQL 9.1.6 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
 4.7.2-2) 4.7.2, 64-bit
 and a WAL-Replication Slave with hot-standby version:
 PostgreSQL 9.1.9 on x86_64-unknown-linux-gnu, compiled by gcc (Debian
 4.7.2-5) 4.7.2, 64-bit.

You're missing 18 months of fixes on the master (slightly less on the
slave).

 Since a few days we had problems with the Linux OOM-Killer.
 Some simple query that normally take around 6-7 minutes now takes 5 hours.
 We did not change any configuration values the last days.

 First of all I have set

 vm.overcommit_memory=2
 vm.overcommit_ratio=80

 Here are some values of my DB-Master config, the Server has 32GB RAM and
 is only for database, no other service.
 Did anybody see some mistakes?

How much swap do you have?

 I'am not shure if work_mem, maintenance_work_mem and
 effective_cache_size is set correct.

That's hard to say. I don't see any immediate issue there, but it really
depends on your application. For example 200 connections with
work_mem=192MB may be dangerous if many connections are active at the same
time.


 /etc/postgresql/9.1/main/postgresql.conf

 max_connections = 200
 ssl = true
 shared_buffers = 6GB  # min 128kB
 work_mem = 192MB  # min 64kB
 maintenance_work_mem = 1GB# min 1MB
 wal_level = hot_standby
 synchronous_commit = off
 wal_buffers = 16MB
 checkpoint_segments = 16
 checkpoint_completion_target = 0.9
 archive_mode = on
 archive_command = 'rsync -a %p -e ssh -i
 /var/lib/postgresql/.ssh/id_rsa
 postgres@my_postgres_slave:/var/lib/postgresql/9.1/wals/dolly_main/%f
 /dev/null'
 max_wal_senders = 1
 wal_keep_segments = 32
 random_page_cost = 2.0
 effective_cache_size = 22GB
 default_statistics_target = 100
 constraint_exclusion = off
 join_collapse_limit = 1
 logging_collector = on
 log_directory = 'pg_log'
 log_filename = 'postgresql-%Y-%m-%d.log'
 log_min_duration_statement = 4
 log_lock_waits = on
 track_counts = on
 autovacuum = on
 log_autovacuum_min_duration = 5000
 autovacuum_max_workers = 4
 datestyle = 'iso, dmy'
 deadlock_timeout = 1s

So what does the query do? Show us explain plan (explain analyze would be
nice, but if it's running so slow).

Which kernel is this? When the OOM strikes, it should print detailed into
into the log - what does it say?

When you look at top output, which processes consume most memory? Are
there multiple backends consuming a lot of memory? What queries are they
running?

Assuming you have a monitoring system in place, collecting memory stats
(you should have that), what does it say about history? Is there a sudden
increase in consumed memory or something suspicious?

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] cannot delete corrupted rows after DB corruption: tuple concurrently updated

2014-02-26 Thread Tomas Vondra

On 26 Únor 2014, 8:45, john gale wrote:

 Does anybody have any ideas about this.

 We restarted the postmaster and the issue persists.  So previously in
 9.0.4 where we could clean corruption, it seems in 9.3.2 we can no longer
 clean corruption.o  I'm assuming this because our data insert environment
 has not changed, so we shouldn't be hitting any different transaction
 concurrency / isolation problems than we did before.

 Is there a way to force deletion of a row, ignoring concurrency, similar
 to concurrent updates.  It looks like changing
 default_transaction_isolation did not affect this:

 munin2=# delete from testruns where ctid = '(37069305,4)';
 ERROR:  tuple concurrently updated

AFAIK this error is raised when a before trigger modifies the row that is
being deleted. Imagine a trigger that does this

   UPDATE testruns SET mycolumn = 1 WHERE id = OLD.id;
   RETURN OLD;

Given the way MVCC in postgres works (copying row when updating), the
error makes sense. In 9.0 this worked by silently skipping the DELETE
(incidentally, I had a few reports about tables that can't be deleted
because of this in the past month).

Anyway, do you have any triggers on the table? If yes, try to disable
them. I suspect the data are corrupted in a way that causes update on the
deleted row - either directly, or maybe because of a cascading effect.

I'm wondering if it might be caused by RI triggers - maybe yes, but I'm
not aware of any RI trigger doing updates.

That being said, I think that what you're doing is wrong. If you think you
have a corrupted database, I'd strongly suggest doing dump/restore. Or how
do you know there's no other corruption lurking in the files, slowly
spreading to other parts of the database?

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Question about memory usage

2014-01-10 Thread Tomas Vondra

On 10 Leden 2014, 19:19, Tom Lane wrote:
 Preston Hagar prest...@gmail.com writes:
 tl;dr: Moved from 8.3 to 9.3 and are now getting out of memory errors
 despite the server now having 32 GB instead of 4 GB of RAM and the
 workload
 and number of clients remaining the same.

 Here are a couple of examples from the incident we had this morning:
 2014-01-10 06:14:40 CST  30176LOG:  could not fork new process for
 connection: Cannot allocate memory
 2014-01-10 06:14:40 CST  30176LOG:  could not fork new process for
 connection: Cannot allocate memory

 That's odd ... ENOMEM from fork() suggests that you're under system-wide
 memory pressure.

 [ memory map dump showing no remarkable use of memory at all ]
 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
  10.1.1.6(36680)ERROR:  out of memory
 2014-01-10 06:18:46 CST 10.1.1.6 16669 [unknown] production
  10.1.1.6(36680)DETAIL:  Failed on request of size 500.

 I think that what you've got here isn't really a Postgres issue, but
 a system-level configuration issue: the kernel is being unreasonably
 stingy about giving out memory, and it's not clear why.

 It might be worth double-checking that the postmaster is not being
 started under restrictive ulimit settings; though offhand I don't
 see how that theory could account for fork-time failures, since
 the ulimit memory limits are per-process.

 Other than that, you need to burrow around in the kernel settings
 and see if you can find something there that's limiting how much
 memory it will give to Postgres.  It might also be worth watching
 the kernel log when one of these problems starts.  Plain old top
 might also be informative as to how much memory is being used.

My bet is on overcommit - what are

vm.overcommit_memory
vm.overcommit_ratio

set to? Do you have a swap or no? I've repeatedly ran into very similar
OOM issues on machines with overcommit disabled (overcommit_memory=2) and
with no swap. There was plenty of RAM available (either free or in page
cache) but in case of sudden peak the allocations failed. Also
vm.swappiness seems to play a role in this.

 The weird thing is that our old server had 1/8th the RAM, was set to
 max_connections = 600 and had the same clients connecting in the same
 way
 to the same databases and we never saw any errors like this in the
 several
 years we have been using it.

Chances are the old machine had swap, overcommit and/or higher swappiness,
so it was not running into these issues with overcommit.

Anyway, I see you've mentioned shmmax/shmall in one of your previous
messages. I'm pretty sure that's irrelevant to the problem, because that
only affects allocation of shared buffers (i.e. shared memory). But if the
database starts OK, the cause is somewhere else.

kind regards
Tomas Vondra



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of size ???

2013-11-27 Thread Tomas Vondra

On 27 Listopad 2013, 22:39, Brian Wong wrote:
 Date: Fri, 22 Nov 2013 20:11:47 +0100
 Subject: Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of
 size ???
 From: t...@fuzzy.cz
 To: bwon...@hotmail.com
 CC: brick...@gmail.com; pgsql-general@postgresql.org

 On 19 Listopad 2013, 5:30, Brian Wong wrote:
  I've tried any work_mem value from 1gb all the way up to 40gb, with no
  effect on the error.  I'd like to think of this problem as a server
  process memory (not the server's buffers) or client process memory
 issue,
  primarily because when we tested the error there was no other load
  whatsoever.  Unfortunately,  the error doesn't say what kinda memory
 ran
  out.

 Hi Brian,

 first of all, please don't top-post, especially if the previous response
 used bottom-post. Thank you ;-)

 Regarding the issue you're seeing:

 * Increasing work_mem in hope that it will make the issue go away is
 pointless. In case work_mem is too low, PostgreSQL will automatically
 spill the data to disk (e.g. it won't do a sort in memory, but will do a
 on-disk merge sort). It will never fail, and messages failed on request
 of size is actually coming from malloc, when requesting another chunk
 of
 memory from the OS. So you're hitting a OS-level memory limit.

 After changing the shared_buffers setting to 200MB, the developer has
 confirmed that the Out Of Memory error no longer happens.  So thanks
 folks.

 Playing with work_mem was out of desperation.  Postgresql simply giving
 the Out of memory error wasn't informative enough about the problem.
 For example, is it the server buffer, the server process, or the client
 process that's having a problem?

PostgreSQL is unable to give you a more detailed information about the
cause because it simply does not have it. You're hitting some limit set at
the kernel level, so PostgreSQL calls malloc() and kernel responds with
NULL. What details would you expect from PostgresQL in that case than
simple statement 'we tried to allocated X bytes and it failed'?

Also, this kind of errors tends to be 'random' i.e. it the allocation
error may happen at different places every time you run the query,
depending on how many backends are running etc.

The fact that decreasing shared buffers to 200 MB made the problem go away
for now only suggests this is a problem with some kernel limit (ulimit,
overcommit, ...). It also means you haven't solved it and it will likely
happen again in the future, e.g. if the amount of data grows and you'll
hit the limit again. Find and fix the actual issue.

Have you checked the ulimit / overcommit as I asked you to?

 Note: AFAIK the only operation that does not spill to disk, and may fail
 with OOM-like errors is hash aggregate. But by increasing the work_mem
 you're actually encouraging PostgreSQL to do this planning error.

 I see the query you're running is doing MAX() so it might be hitting
 this
 issue. How much data are you dealing with? How many groups are in the
 result?

 * Setting shared buffers to 18GB is almost certainly a bad choice. I'm
 yet
 to see a real-world database where shared_buffers over 8GB actually make
 a
 measurable difference. More is not always better, and you're actually
 reserving memory that can't be used for work_mem (so you're making the
 issue worse). Please, decrease shared_buffers to e.g. 4GB, then try to
 increase it and measure the performance difference.

 If you look at the documentation about how to configure shared_buffers, it
 is very unclear to me how I can maximize performance by allocating as much
 memory to the buffer as possible.  On one hand, the documentation says I
 shouldn't go high on the shared_buffers setting.  On the other hand, the
 more memory you allocate to the buffers, the better the performance is
 supposedly.  So at least as of 9.1, this is annoying.  I heard that
 starting from 9.2, this behavior changed dramatically?

I'm not sure I understand what you're trying to say. It's true the
documentation does not give perfectly clear instructions on how to set
optimal shared_buffers. The reason is very simple - there's no such value,
optimal for all workloads, operating systems and PostgreSQL versions. The
assumption that the more is better is incorrect for several reasons.

There's a more detailed wiki page about this:

https://wiki.postgresql.org/wiki/Tuning_Your_PostgreSQL_Server

The 200MB you've set seems way too low in your case, and aside from making
it slower than necessary, you most probably haven't fixed the actual
issue.

regards
Tomas

-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of size ???

2013-11-23 Thread Tomas Vondra

Hi,

On 22 Listopad 2013, 20:09, Edson Richter wrote:

 Excuse me (or just ignore me) if it is a stupid question, but have you
 configured sysctl.conf accordingly?
 For instance, to use larget memory settings, I had to configure my EL as
 follows:

 # Controls the maximum shared segment size, in bytes
 kernel.shmmax = 68719476736

 # Controls the maximum number of shared memory segments, in pages
 kernel.shmall = 4294967296

This only influences shared memory, which is used for shared buffers (not
for memory allocated during query execution, as that's translated to plain
malloc). With sysctl.conf misconfigured, the database would not start at
all and you'd get a reasonable error message in the log, suggesting pretty
clearly where/how to fix it.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of size ???

2013-11-22 Thread Tomas Vondra

On 19 Listopad 2013, 5:30, Brian Wong wrote:
 I've tried any work_mem value from 1gb all the way up to 40gb, with no
 effect on the error.  I'd like to think of this problem as a server
 process memory (not the server's buffers) or client process memory issue,
 primarily because when we tested the error there was no other load
 whatsoever.  Unfortunately,  the error doesn't say what kinda memory ran
 out.

Hi Brian,

first of all, please don't top-post, especially if the previous response
used bottom-post. Thank you ;-)

Regarding the issue you're seeing:

* Increasing work_mem in hope that it will make the issue go away is
pointless. In case work_mem is too low, PostgreSQL will automatically
spill the data to disk (e.g. it won't do a sort in memory, but will do a
on-disk merge sort). It will never fail, and messages failed on request
of size is actually coming from malloc, when requesting another chunk of
memory from the OS. So you're hitting a OS-level memory limit.

Note: AFAIK the only operation that does not spill to disk, and may fail
with OOM-like errors is hash aggregate. But by increasing the work_mem
you're actually encouraging PostgreSQL to do this planning error.

I see the query you're running is doing MAX() so it might be hitting this
issue. How much data are you dealing with? How many groups are in the
result?

* Setting shared buffers to 18GB is almost certainly a bad choice. I'm yet
to see a real-world database where shared_buffers over 8GB actually make a
measurable difference. More is not always better, and you're actually
reserving memory that can't be used for work_mem (so you're making the
issue worse). Please, decrease shared_buffers to e.g. 4GB, then try to
increase it and measure the performance difference.

* So how much memory does the query allocate? Can you watch it over
top/free to get an idea if it e.g. allocates all available memory, or if
it allocates only 1GB and then fail, or something?

* I believe you're hitting some sort of limit, imposed by the operating
system. Please check ulimit and overcommit settings.

* BTW the SO post you mentioned as a perfect match was talking about query
executed over dblink - are you doing the same? If so, try to move the
aggregation into the query (so that the aggregation happens on the other
end).

regards
Tomas


 --- Original Message ---

 From: bricklen brick...@gmail.com
 Sent: November 18, 2013 7:25 PM
 To: Brian Wong bwon...@hotmail.com
 Cc: pgsql-general@postgresql.org
 Subject: Re: [GENERAL] ERROR: out of memory DETAIL: Failed on request of
 size ???

 On Mon, Nov 18, 2013 at 12:40 PM, Brian Wong bwon...@hotmail.com wrote:

 We'd like to seek out your expertise on postgresql regarding this error
 that we're getting in an analytical database.

 Some specs:
 proc: Intel Xeon X5650 @ 2.67Ghz dual procs 6-core, hyperthreading on.
 memory: 48GB
 OS: Oracle Enterprise Linux 6.3
 postgresql version: 9.1.9
 shared_buffers: 18GB

 After doing a lot of googling, I've tried setting FETCH_COUNT on psql
 AND/OR setting work_mem.  I'm just not able to work around this issue,
 unless if I take most of the MAX() functions out but just one.


 What is your work_mem set to?
 Did testing show that shared_buffers set to 18GB was effective? That seems
 about 2 to 3 times beyond what you probably want.





-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Tuning 9.3 for 32 GB RAM

2013-11-14 Thread Tomas Vondra

On 14 Listopad 2013, 16:09, Alexander Farber wrote:
 pgtune has produced the following for my server (the specs:
 http://www.hetzner.de/en/hosting/produkte_rootserver/px60ssd ):

 default_statistics_target = 50
 maintenance_work_mem = 1GB
 constraint_exclusion = on
 checkpoint_completion_target = 0.9
 effective_cache_size = 22GB
 work_mem = 192MB
 wal_buffers = 8MB
 checkpoint_segments = 16
 shared_buffers = 7680MB
 max_connections = 80

 Is it really okay? Isn't 22GB too high?

effective_cache_size is a hint on how much memory is there for filesystem
cache, so that it can be considered when planning queries. PostgreSQL
relies on filesystem cache, so this is needed to answer questions like
'how probable it's the block is in cache and won't actually cause any
I/O?

It does not allocate anything. You do have 32GB of RAM in total, so 22GB
for caches seems about right unless you're running other memory-intensive
applications on the same box (making less memory to be available for the
filesystem cache).

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Clang 3.3 Analyzer Results

2013-11-10 Thread Tomas Vondra

Hi,

On 11 Listopad 2013, 7:33, Jeffrey Walton wrote:
 I've been tasked with a quick acceptance check of Postgres for an
 upcoming project. It's a quick check, so its limited to Clang's
 analyzer and sanitizers.

 The analyzer is reporting some findings, and some of the findings look
 legitimate.

 For example, it looks like there's a double `free` occurring in
 streamutil.c (around line 115). Here's a screen capture of it under
 scan-view: http://postimg.org/image/3ph4hkyav/. From the capture, it
 looks like `password` should be set to NULL after `free` because Clang
 found a path to get back to the top of the loop (which will free
 `password` again`).

Probably. From a quick glance at streamutil.c, it seems to have other
issues too, not just the double free. For example it does a free on the
password, but then uses the value for dbpassword (not sure if that code
path actually is possible - maybe it always gets into the branch with
password prompt).

 There's some others of interest, too. For example, Divide by Zero and
 Buffer Overflows. Here's the index.html from the scan-view report:
 http://postimg.org/image/tn2ovjout/.

 The scan-view tar ball is a 5.5 megabytes in size (its HTML based with
 a lot of mouse over markup to help understand flows), and I'm not sure
 the bug reporter will take it. Plus the developers may not want it
 added to the bug reporter.

 Would someone know the best way to get this to the right folks?

 Thanks in advance. (And sorry reporting to pgsql-general - the
 developer list states emails must go elsewhere first).

IMHO pgsql-hackers is the right audience for reports like this. The 'must
ask somewhere else first' is meant for regular questions that are not that
closely related to postgresql development, and are likely to be answered
in the generic mailing lists.

Please, upload the HTML report somewhere and post a link. If it's easier
to the clang analysis, maybe post instructions on how to do that.

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Increasing CPU usage of PostgreSQL

2013-10-26 Thread Tomas Vondra

On 25.10.2013 19:04, Scott Marlowe wrote:
 On Fri, Oct 25, 2013 at 8:29 AM, Rahila Syed rahilasye...@gmail.com wrote:

 Configurations of my machine is:

 Processors:   Xeon E5-2650 Processor Kit
   Intel® Xeon ® Processor E5-2650 (2 GHz, 8C/16T,
 20 MB) * 2 nos


 RAM :  32GB DDR3-1600 REG Memory Kit
 8x 4GB Registered ECC DIMM, DDR3L-1600(PC3L-12800)

 HDD:  450GB 10K Hot Plug 2.5-inch SAS HDD * 8 nos
  1 x 450 GB SAS HDD, 2.5-inch, 6Gb/s

 Disk Speed  : 10,000 RPM

 RAID Controller (512MB, RAID 0/1)
 
 My guess is that you're maxing out your IO subsystem long before
 you're maxing out CPU. What does
 
 iostat -xd 10
 
 have to say about it?

Right, that's my guess too. The problem is most likely the sync at the
end of the transaction.

Rahila, if you want to saturate the CPU and don't care about the
particular benchmark, try to use read-only transactions. Either just add
-S at the pgbench command line, or write something SELECT-only on your
own. Anyway, use '-j' in such cases.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Replication and fsync

2013-10-26 Thread Tomas Vondra

Hi,

On 24.10.2013 23:18, Alban Hertroys wrote:
 On Oct 24, 2013, at 18:10, maillis...@gmail.com wrote:
 
 Thank you for the answers. I'm still confused. If fsync is not
 replicated to the slave, then how is replication affected by a
 corrupt master? If the master dies and there's a commit recorded in
 the wal log that didn't actually happen, wouldn't the slave still
 be expected to be in a sane state, with the wal logs accurately
 reflecting what's on disk?
 
 Maybe I just don't understand streaming replication enough. The
 docs seem to say that synchronous commits mean that the slave also
 has to verify a write before a transaction is considered complete.
 How does fsync affect the way/order in which statements are sent to
 the slave for replication?
 
 What you're missing is that the master will be replicating corrupt
 data. That is, _if_ it gets corrupted of course. But, data corruption
 in a database has a tendency to go unnoticed for a while.

I think it's safe as long as you don't try to reuse the cluster after a
crash (be it due to OS error, power outage, ...). If the primary crashes
for any reasons, you have to start from scratch, otherwise there might
be silent corruption as you've described.

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Is there something like a limited superuser to give to a db-assistant?

2013-10-24 Thread Tomas Vondra

On 25 Říjen 2013, 3:53, Andreas wrote:
 Hi,

 how can I give adb-assistant the rights to create and drop schemas,
 tables, views ... BUT keep him out of certain existing schemas and tables?

 This position of an restricted db assistant is new for us and it wasn't
 considered in our access rights until now.

 Our DB got a wee bit komplex with about 400 schemas and 5000 tables.
 So it would get rather ugly having to alter the rights for every
 db-object one by one manually.

Hi Andreas,

I think you should read this:
http://www.postgresql.org/docs/9.3/static/sql-grant.html

You don't need superuser privileges to create schemas and so on, and by
definition superuser means unrestricted. Just create a regular user
and grant him CREATE on the database. You may also grant him access only
to selected schemas/tables.

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need help how to manage a couple of daily DB copies.

2013-10-24 Thread Tomas Vondra

On 25 Říjen 2013, 3:53, Andreas wrote:
 Hi,

 I'd like to set up a DB-Server that keeps copies of our productive db
 for an external db-assistant.
 He should prepare chores on the test-server and mail the sql scripts to
 me.
 I'll look over those scripts and run them against the productive db
 myself.

 So I'd like to have a daily cron job dump the main db, rename the
 test-db to something with a date in it.
 Like   test_db  --  test_db_20131024
 Create a new test_db and import the dump of the main db.

 So far no problem but how could I limit the number of test_dbs to 5?
 I'd like to keep those test_dbs 5 days and then drop them.

Hi,

I assume that's something that needs to be done by your script, there's
certainly nothing in PostgreSQL itself to do that.

You may for example run a daily cron script that lists all databases on
the test server, parses the database name and drops those older than 5
days.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need some help on Performance 9.0.4

2013-10-18 Thread Tomas Vondra

Hi,

On 18 Říjen 2013, 17:06, akp geek wrote:
 when I issue the top command on solaris, what ever I have posted is
 exactly
 getting.

 top  -n
 load averages:  11.4,  10.8,  10.2;up 680+21:31:46
 15:05:21
 137 processes: 123 sleeping, 14 on cpu
 CPU states: 82.0% idle, 17.5% user,  0.5% kernel,  0.0% iowait,  0.0% swap
 Memory: 64G phys mem, 20G free mem, 63G total swap, 63G free swap

PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
  12185 postgres   1  500 3172M 3165M cpu/09:56  1.80% postgres
  23338 postgres   1  500 3180M 3173M cpu/8   79:08  1.77% postgres
   2446 postgres   1   00 3174M 3167M cpu/24   0:58  1.76% postgres
  18359 postgres   1   00 3172M 3164M cpu/46   4:00  1.60% postgres
  28428 postgres   1   00 3178M 3171M cpu/17  58:15  1.60% postgres
  10698 postgres   1   00 3179M 3173M cpu/62  68:33  1.55% postgres
  28374 postgres   1   00 3178M 3172M cpu/14  47:01  1.55% postgres
   6145 postgres   1   00 3180M 3173M cpu/39 115:05  1.55% postgres
   2380 postgres   1   00 3179M 3172M cpu/34  53:38  1.54% postgres
   2088 postgres   1   00 3179M 3172M cpu/31  37:43  1.51% postgres
  19360 postgres   1  500 3172M 3165M cpu/36:55  0.84% postgres
  28426 postgres   1  190 3179M 3172M cpu/21  70:34  0.61% postgres
  21196 postgres   1  520 3403M 3392M sleep0:05  0.37% postgres
  21237 root   1  490 5632K 4000K cpu/51   0:00  0.11% sshd
415 daemon 6  590 6312K 3208K sleep  107:52  0.02% kcfd

1) I'm not sure what options top has on solaris, but on Linux you can
hit c which will give you an extended info about the projects.

2) I seriously doubt there are just ssh+kscd+postgres processes running on
your box.

3) Are you sure the top output was collected while the slow queries were
running? Because to me it seems the top output was collected while the
system is relatively idle - there's 82% free CPU, no I/O activity at all.
Are you sure you have no zones/containers defines? I'm not familiar with
that so maybe this is how it looks when you hit zone limits?



 this is the plan that was generated again.. Thanks a lot.

 Hash Join  (cost=343.96..41796.15 rows=751 width=33) (actual
 time=635.192..5799.477 rows=709 loops=1)
Hash Cond: ((te.statuscode)::text = (tt.statuscode)::text)
-  Hash Join  (cost=342.58..41778.81 rows=751 width=19) (actual
 time=635.135..5797.003 rows=709 loops=1)
  Hash Cond: (te.scenarioid = ts.scenarioid)
  -  Hash Join  (cost=243.38..41667.95 rows=1947 width=26)
 (actual time=634.564..5794.768 rows=711 loops=1)
Hash Cond: (te.userid = tu.userid)
-  Nested Loop  (cost=0.00..41311.26 rows=25026
 width=33) (actual time=0.060..5782.882 rows=11223 loops=1)
  -  Nested Loop  (cost=0.00..1639.76 rows=2751
 width=15) (actual time=0.023..20.904 rows=3702 loops=1)
Join Filter: (td.occserviceareaid =
 tz.occserviceareaid)
-  Seq Scan on t_occ tz  (cost=0.00..1.04
 rows=1 width=14) (actual time=0.006..0.012 rows=1 loops=1)
  Filter: (occid = 10::numeric)
-  Seq Scan on t_domesticnotamlocation td
 (cost=0.00..1471.10 rows=13410 width=15) (actual time=0.012..13.932
 rows=13410 loops=1)
  -  Index Scan using x_event_ix1 on t_event te
 (cost=0.00..14.26 rows=13 width=34) (actual time=0.339..1.553 rows=3
 loops=3702)
Index Cond: (te.affectedlocationid = td.locid)
Filter: (((te.statuscode)::text = ANY
 ('{CON,AERR,SERR,CERR,ACON}'::text[])) OR (((te.statuscode)::text =
 'ACT'::text) AND (COALESCE((te.enddate)::time
 stamp with time zone, now()) = (now() - '2 days'::interval))) OR
 (((te.statuscode)::text = 'CAN'::text) AND (te.canceldate = (now() -
 '2 days'::interval))) OR (((te.statuscode)
 ::text = ANY ('{APEN,CINP,AINP}'::text[])) AND
 (COALESCE((te.enddate)::timestamp with time zone, now()) = now(
-  Hash  (cost=240.34..240.34 rows=243 width=7)
 (actual time=4.040..4.040 rows=243 loops=1)
  Buckets: 1024  Batches: 1  Memory Usage: 8kB
  -  Seq Scan on t_user tu  (cost=0.00..240.34
 rows=243 width=7) (actual time=0.040..3.868 rows=243 loops=1)
Filter: ((usertype)::text = 'OCC'::text)
  -  Hash  (cost=98.64..98.64 rows=45 width=29) (actual
 time=0.549..0.549 rows=45 loops=1)
Buckets: 1024  Batches: 1  Memory Usage: 3kB
-  Nested Loop  (cost=0.00..98.64 rows=45 width=29)
 (actual time=0.044..0.496 rows=45 loops=1)
  -  Index Scan using x_locscenario_pk on
 t_locscenariomap tc  (cost=0.00..43.04 rows=45 width=22) (actual
 time=0.028..0.098 rows=45 loops=1)
Index Cond: (locid = 10::numeric)
  -  Index Scan using

Re: [GENERAL] Need some help on Performance 9.0.4

2013-10-17 Thread Tomas Vondra

On 17.10.2013 20:56, akp geek wrote:
 got the output from depesz and this the top on the system.   thanks for
 the help ..

From depesz? That site works differently - you enter the explain plan
into the form, it does some formatting/highlighting and you're supposed
to submit the link to that page. E.g. http://explain.depesz.com/s/JwTB
is for your query.

However it seems you've done some anonymization on the query, e.g.
replacing the table/index/function names and string literals with some
random words.

Anyway, the plan shows the estimates are reasonably accurate, but the
index scan on romeo_four consumes ~99% of the query duration. Not sure
if it can be improved, because we don't know the original query and the
anonymization made it rather annoying to deal with.

 
 Regards
 
 load averages:  30.3,  28.8,  27.8;up
 680+00:51:09  
 
 18:24:44
 156 processes: 123 sleeping, 33 on cpu
 CPU states: 49.5% idle, 50.0% user,  0.4% kernel,  0.0% iowait,  0.0% swap
 Memory: 64G phys mem, 19G free mem, 63G total swap, 63G free swap
 
PID USERNAME LWP PRI NICE  SIZE   RES STATETIMECPU COMMAND
   7965 postgres   1  200 3178M 3171M cpu/7   78:21  1.58% postgres
  20638 postgres   1  200 3181M 3175M cpu/28 187:01  1.58% postgres
  22819 postgres   1  300 3181M 3175M cpu/33 264:06  1.57% postgres
   3789 postgres   1  200 3183M 3176M cpu/9  626:11  1.57% postgres
  10375 postgres   1  300 3182M 3175M cpu/50 293:55  1.57% postgres
  27335 postgres   1  300 3175M 3169M cpu/29  27:27  1.57% postgres
   7959 postgres   1  300 3178M 3171M cpu/41 104:07  1.57% postgres
   8156 postgres   1  300 3180M 3173M cpu/43 124:18  1.56% postgres
  20640 postgres   1   00 3182M 3176M cpu/58 207:49  1.56% postgres
  10584 postgres   1  300 3179M 3173M cpu/35  76:32  1.56% postgres
  13984 postgres   1  300 3181M 3174M cpu/30 207:04  1.56% postgres
   3575 postgres   1  200 3283M 3261M cpu/19   1:48  1.56% postgres
   7964 postgres   1  400 3177M 3170M cpu/62  82:56  1.56% postgres
   1299 postgres   1  400 3166M 3161M cpu/52   5:11  1.56% postgres
  27692 postgres   1   00 3181M 3174M cpu/46 260:58  1.56% postgres


Once again, this top output is incomplete - all it lists are postgres
processes (and it's unclear what they're doing), but there are certainly
some other processes running. Moreover, those 15 processes account for
only ~25% of CPU, but there's almost 50% CPU utilized. So where's the rest?

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] werid errorindex row size 3040 exceeds btree maximum, 2712 occur randomly

2013-10-15 Thread Tomas Vondra

On 15.10.2013 03:44, Huang, Suya wrote:
 Thanks Tomas!
 
 However, in the example I sent,  I already did a vacuum full right
 after deleted the rows causing problem, before created the index and
 got an error even the table is vacuumed. Note, the table is I
 temporarily created using create table as select *... so no other
 people is accessing that table, except me for the testing purpose.
 
 Any ideas? And today, while I did the same thing, I can create index
 on the table right after I deleted the rows causing problem, without
 vacuum.
 
 Anything I missed here?

Not sure. The only thing I can think of is another transaction
preventing the autovacuum from removing the rows, but I can't reproduce
it. What PostgreSQL version are you using?

BTW take a look at this contrib module:

   http://www.postgresql.org/docs/9.1/static/pageinspect.html

It allows looking directly on items on a data page like this:

   select * from heap_page_items(get_raw_page('test', 0));

Maybe that'll help you identify the problem.

Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] werid errorindex row size 3040 exceeds btree maximum, 2712 occur randomly

2013-10-14 Thread Tomas Vondra

Hi,

On 14.10.2013 05:47, Huang, Suya wrote:
 Hi,
 
 OK, first, I know the reason of this error “index row size 3040
 exceeds btree maximum, 2712” and know that we cannot create index on
 certain columns with size larger than 1/3 buffer page size.
 
 The question is, no matter if I deleted records that caused the
 problem or all records of the table, the error still occurred and
 would disappear after a while randomly, like 1 or 2 minutes or so.

I'd bet what you see is a caused by MVCC. The deleted records are not
deleted immediately, but marked as deleted and then eventually freed by
(auto)vacuum background process, once no other sessions need them.

But those records need to be indexed, as other sessions may still need
to access them (depending on the transaction isolation level used), so
we can't just skip them when creating the index.

See this for more details on this topic:
http://www.postgresql.org/docs/9.3/static/transaction-iso.html

Try running VACUUM on the table before creating the index, and make sure
there are no other connections accessing the table. That should do the
trick.

That being said, I wonder why you need to create a gin index on such
long values. Any particular reason why you decided not to use a MD5 hash
of the value, as suggested by the HINT message?

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Postgresql 9.0.13 core dump

2013-10-14 Thread Tomas Vondra

On 14.10.2013 22:18, Laurentius Purba wrote:
 Hello all,
 
 I am having core dump on Postgres 9.0.13 with the message ...was
 terminated by signal 10: Bus error
 
 So, I set a PID on the log file to capture specific PID that causing
 this crash. After, several crashes, I finally got the PID that causing
 this crash. But I am still not convinced that this process that causing
 this crash. That is the reason why I sent out this email, hoping anybody
 can help me or have had this experience before.
 
 Based on the PID on the log file, the crash happened while the
 application was trying to update a table's field with binary (PDF)
 content. The datatype of this field is TEXT.

Hi Laurentius,

wouldn't it be better to use BYTEA columns for binary content, not TEXT?
I'm not sure if that's a problem with PDF, but generally TEXT does not
allow some octet values (e.g. '\0').

The backtrace you've posted however lists a bunch of libxml functions at
the top, and in my experience libxml is not the best coded piece of
software. So I'd guess the problem is somewhere within libxml. What
version of libxml are you using?

Signal 10 usually means hardware error (but if the other jails are
running fine, it's unlikely) or about the same as SEGFAULT (i.e.
accessing invalid memory etc.).

What I don't understand is why the call ended in libxml when you're
dealing with PDF?

Is this reproducible? Does that happen with a particular PDF, or with
random PDF documents? Can you prepare a small self-contained test case?

regards
Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Need some help on Performance 9.0.4

2013-10-13 Thread Tomas Vondra

Hi,

On 13.10.2013 17:43, akp geek wrote:
 Sorry all not posting clear.  I posted our postresql.conf file. What
 we are having is very slow response from the database.
 
 would like to get some feedback about postgresql.conf file
 parameters that I posted are causing the issue for performance of the
 DB.
 
 Our database is about 60GB and performance is very bad.

The postgresql.conf seems fine - at least I haven't noticed anything
obviously wrong there. But we know almost nothing about your HW (not
even what kind of CPU / how many cores, etc.) or  how you're using it.

For example you haven't even posted a complete top output, so we have no
idea what else is running on the box. OTOH the box seems to be rather
idle - there's plenty of CPU, RAM and the disks are rather idle.

Why exactly do you think it's the database that's slow?

We can't really help you unless you show us what exactly you consider
slow. The best thing you can probably do is to show us a slow query
along with it's EXPLAIN ANALYZE output (use explain.depesz.com to post it).

And it'd be nice to know the history around this - was it fast before,
did it get slow suddenly or over a longer period of time?

regards
Tomas


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 9.2.4 temp files never released?

2013-09-29 Thread Tomas Vondra

On 28 Září 2013, 23:25, Edson Richter wrote:
 Em 28/09/2013 18:12, Tomas Vondra escreveu:
 Do you realize the counters in pg_stat_database (temp_files, temp_bytes)
 are counters tracking all the temp files created since the last reset of
 the stats (possible since the cluster was created)?

 So these values only increment (and it's meant to behave like that). BTW
 this is true for most values in the stats catalogs, so there's nothing
 like current stats - to get something like that you need to get two
 snapshots and subtract them (to get the delta).

 Tomas

 Ah, ok! One more lesson learned...
 I would expect stats being current of, or up-to-date after a vacuum
 analyze...
 Thank you all very much.

Vacuum has nothing to do with this. The temp files are removed immediately
(by the particular operation that created them) once not needed anymore.
Unless something goes terribly wrong, of course ...

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multiple databases vs multiple clusters on the same host

2013-09-28 Thread Tomas Vondra

On 28 Září 2013, 20:12, Eugene Ostrovsky wrote:
 Hello!

 I would like to find out what is the difference in hardware resources
 consuming between two solutions:
 1. Several databases in the same postgresql cluster
 2. Several clusters (one per each database) on the same host

 Currently I have about 10 databases in the same cluster. For some reasons
 I'm going to switch to using separate clusters on the same machine. I
 suspect that this could affect the performance.

 Any ideas on how much more hardware resources will be consumed?

Well, that's hard to say because we don't know (a) the version of
PostgreSQL you're using, (2) how you use it and (c) what hardware you use.

There are probably some corner cases where this might improve the
performance, but in most cases it's going to be worse. Why are you
switching to multiple clusters?

For example consider that you'll probably have to use much smaller shared
buffers (which might cause issues on the active database, while the other
databases don't use their portion of memory), you'll have to either use
much smaller max_connections or decrease work_mem (you can't just keep the
values because then you might run into OOM much more frequently) etc.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] multiple databases vs multiple clusters on the same host

2013-09-28 Thread Tomas Vondra

On 28 Září 2013, 21:30, Eugene Ostrovsky wrote:
 Thanks for the answer!

 About you questions:
 1. Postgres 9.3
 2. There are about 30-50 user connections. Actually Only 2 of databases
 are used intensively, others only in rare cases.
 3. Hardware is  AMD Phenom II X4 965, 8 Gb RAM, 2 SATA2 HDD in software
 mirror raid
 4. The reason to switch to multiple clusters is that my software uses
 roles (login users and groups) for a single database. There are some
 problems with it in case of several databases because in postgres roles
 are shared between all the databases in the same cluster.

Wouldn't it be easier just setup unique roles for each database? It's much
better solution than deploying 10 separate clusters (which you'll learn
soon, if you go in this direction).

And what are those some problems that you mentioned? Seems to me this
thread started from the wrong end - setting up multiple clusters instead
of tackling the actual problem first.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] PostgreSQL 9.2.4 temp files never released?

2013-09-28 Thread Tomas Vondra

On 28 Září 2013, 22:54, Edson Richter wrote:
 Em 28/09/2013 15:54, Adrian Klaver escreveu:
 On 09/28/2013 11:30 AM, Edson Richter wrote:
 Em 28/09/2013 15:22, Adrian Klaver escreveu:
 On 09/28/2013 11:16 AM, Edson Richter wrote:
 I've a 12Gb database running without problems in Linux Centos 64bit
 for
 years now.
 Looking database statistics (in pgAdmin III), I can see that there
 are
 366 temporary files, and they sum up 11,863,839,867 bytes in size.

 What are the temp files named and where are they located?

 Sorry if this sounds silly, but how can I discover this information?

 Assuming pgAdmin is using  pg_stat_database then:

 http://www.postgresql.org/docs/9.2/static/storage-file-layout.html

 Temporary files (for operations such as sorting more data than can
 fit in memory) are created within PGDATA/base/pgsql_tmp, or within a
 pgsql_tmp subdirectory of a tablespace directory if a tablespace other
 than pg_default is specified for them. The name of a temporary file
 has the form pgsql_tmpPPP.NNN, where PPP is the PID of the owning
 backend and NNN distinguishes different temporary files of that
 backend.

 Ok. Found the place.
 So, there is nothing there. PG_DATA/base/pgsql_tmp is a empty directory.
 I've run the query over pg_stat_database view and there is nothing wrong
 with pgAdmin III - the information is all there.
 I've also run a vacuum freeze analyze, but made no difference.

 I believe that statistics are outdated, since there is no temp file at
 all.

Do you realize the counters in pg_stat_database (temp_files, temp_bytes)
are counters tracking all the temp files created since the last reset of
the stats (possible since the cluster was created)?

So these values only increment (and it's meant to behave like that). BTW
this is true for most values in the stats catalogs, so there's nothing
like current stats - to get something like that you need to get two
snapshots and subtract them (to get the delta).

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Sum of columns

2013-09-08 Thread Tomas Vondra

On 9 Září 2013, 3:12, jane...@web.de wrote:
 Hi,  this is my query:  SELECT user,  sum(CASE WHEN lev = 50
 AND lev  70 THEN 1 ELSE 0 END) as a,
  sum(CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) as b,
  sum(CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END) as d,
  (SELECT a + b + a + d) AS matches
  FROM t_temp_fts
  GROUP BY user#39; I like to add up the 4 columns a,b,c and d
 of every user, but it doesn#39;t work like this.  Does anyone
 know a solution Janek Sendrowski

Hi,

it doesn't work like that - the inner select makes no sense for various
reasons. I'd bet what you want is something this:

SELECT user, a, b, d, (a + b + d) AS matches
FROM (

  SELECT user,
  sum(CASE WHEN lev = 50 AND lev  70 THEN 1 ELSE 0 END) as a,
  sum(CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) as b,
  sum(CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END) as d,
  FROM t_temp_fts
  GROUP BY user
) foo

i.e. it takes the t_temp_fts table, computes the partial results and then
passes the results to the outer query to evaluate the addition.

There's an alternative doing all of that in a single query:

SELECT user, a, b, d, (a + b + d) AS matches
FROM (

  SELECT user,
  sum(CASE WHEN lev = 50 AND lev  70 THEN 1 ELSE 0 END) as a,
  sum(CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) as b,
  sum(CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END) as d,
  sum(CASE WHEN lev = 50 AND lev  70 THEN 1 ELSE 0 END) +
  sum(CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) +
  sum(CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END) as matches,
  FROM t_temp_fts
  GROUP BY user
) foo

or you could add directly the CASE statements like this:

SELECT user, a, b, d, (a + b + d) AS matches
FROM (

  SELECT user,
  sum(CASE WHEN lev = 50 AND lev  70 THEN 1 ELSE 0 END) as a,
  sum(CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) as b,
  sum(CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END) as d,
  sum((CASE WHEN lev = 50 AND lev  70 THEN 1 ELSE 0 END) +
  (CASE WHEN lev = 70 AND lev  80 THEN 1 ELSE 0 END) +
  (CASE WHEN lev = 80 AND lev  90 THEN 1 ELSE 0 END)) as
matches,
  FROM t_temp_fts
  GROUP BY user
) foo

All of this should return return the same results.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] Old record migration to another table made the db slower

2013-09-05 Thread Tomas Vondra

On 5 Září 2013, 8:52, Arun P.L wrote:
 Hi all,
 I have moved a lot of old records from one table say original_data_table
 to another table say old_data_table which is newly created for the old
 records. There were around 15 milliion old records to move. After this
 migration I have done standard vacuum on new old_data_table as well as on
 original_data_ table.

  Now the problem is some queries are running very slow especially 'select
 max(date)  from one of the table'.  Also db seem to be a bit slower after
 all these works. Actual intention of the migration of old records was to
 speed up the original_data_table which is used frequently by users. How
 can I get this problem get this fixed? What are the steps to be followed
 for the same?
 My postgresql version is 7.4.3.   Thanks in Advance for your helps,Arun

Hi,

First of all, 7.4 reached end-of-life ~3 years ago. And 7.4.3 is not even
the last release in this branch - there are ~30 updates. Go and upgrade to
7.4.30 now.

We'll need a bit more info about the tables / queries. What exactly have
you done? Show us the commands you've executed. Then show us EXPLAIN
ANALYZE for the queries.

Tomas




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

Re: [GENERAL] inserting huge file into bytea cause out of memory

2013-08-06 Thread Tomas Vondra

Hi,

On 6 Srpen 2013, 9:12, liuyuanyuan wrote:
 Error detail:
 org.postgresql.util.PSQLException: Error: out of memory
   Details:Failed on request of size 268443660.

Seems like an issue with the OS, not with PostgreSQL, to me.

What OS and HW are you using? How much memory you have and do you have
some user limits in place? For example Linux uses ulimit and some kernel
parameters to limit how much memory can be allocated by a process.

Try this as postgres (or the user you run the database as):

   ulimit -v

and as root

   sysctl.overcommit_memory

These are the

What limits do you have in place options that bite me most often.

Tomas



-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general

1 2 3 4 >

1 - 100 of 335 matches

Mail list logo