Re: [PERFORM] Postgres not using all RAM (Huge Page activated on a 96GB RAM system)

2017-03-24 Thread Andrew Kerber
 postgres  20   0 24.918g 214300  12688 R  99.3  0.2   2865:59
> postgres
>
>
>  9063 postgres  20   0 24.722g  14812  12956 S   0.3  0.0   0:07.36
> postgres
>
>
>  9068 postgres  20   0 24.722g   6380   4232 S   0.3  0.0   0:02.15
> postgres
>
>
>  9065 postgres  20   0 24.727g  10368   3516 S   0.0  0.0   0:04.24
> postgres
>
>
>  9066 postgres  20   0 24.722g   4100   2248 S   0.0  0.0   0:06.04
> postgres
>
>
>  9067 postgres  20   0 24.722g   4100   2248 S   0.0  0.0   0:01.37
> postgres
>
>
>  9069 postgres  20   0  161740   4596   2312 S   0.0  0.0   0:04.48
> postgres
>
> What’s wrong with this? There isn’t something wrong in RAM usage?
>
> Thank you all
>Pietro
>



-- 
Andrew W. Kerber

'If at first you dont succeed, dont take up skydiving.'


Re: [PERFORM] jsonb_agg performance

2016-01-29 Thread Andrew Dunstan



On 01/29/2016 05:06 PM, jflem...@kispring.com wrote:
The jsonb_agg function seems to have significantly worse performance 
than its json_agg counterpart:


=> explain analyze select pa.product_id, jsonb_agg(attributes) from 
product_attributes2 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual 
time=28.632..241.647 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual 
time=28.526..32.826 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa (cost=0.00..551.00 
rows=8800 width=380) (actual time=0.010..7.231 rows=8800 loops=1)

 Planning time: 0.376 ms
 Execution time: 242.963 ms
(8 rows)

=> explain analyze select pa.product_id, json_agg(attributes) from 
product_attributes3 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1136.54..1240.62 rows=3046 width=387) (actual 
time=17.731..30.126 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1136.54..1158.54 rows=8800 width=387) (actual 
time=17.707..20.705 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3416kB
 ->  Seq Scan on product_attributes3 pa (cost=0.00..560.00 
rows=8800 width=387) (actual time=0.006..5.568 rows=8800 loops=1)

 Planning time: 0.181 ms
 Execution time: 31.276 ms
(8 rows)

The only difference between the two tables is the type of the 
attributes column (jsonb vs json).  Each table contains the same 8800 
rows.  Even running json_agg on the jsonb column seems to be faster:


=> explain analyze select pa.product_id, json_agg(attributes) from 
product_attributes2 pa group by pa.product_id;

QUERY PLAN
--
 GroupAggregate  (cost=1127.54..1231.62 rows=3046 width=380) (actual 
time=30.626..62.943 rows=3046 loops=1)

   Group Key: product_id
   ->  Sort  (cost=1127.54..1149.54 rows=8800 width=380) (actual 
time=30.590..34.157 rows=8800 loops=1)

 Sort Key: product_id
 Sort Method: external sort  Disk: 3360kB
 ->  Seq Scan on product_attributes2 pa (cost=000..551.00 
rows=8800 width=380) (actual time=0.014..7.388 rows=8800 loops=1)

 Planning time: 0.142 ms
 Execution time: 64.504 ms
(8 rows)

Is it expected that jsonb_agg performance would be that much worse 
than json_agg? 



I do expect it to be significantly worse. Constructing jsonb is quite a 
lot more expensive than constructing json, it's the later processing 
that provides the performance benefit of jsonb. For 99 out of 100 uses 
that I have seen there is no need to be using jsonb_agg, since the 
output is almost always fed straight back to the client, not stored or 
processed further in the database. Rendering json to the client is 
extremely cheap, since it's already just text. Rendering jsonb as text 
to the client involves a lot more processing.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] problem with select *

2015-08-24 Thread Andrew Dunstan



On 08/24/2015 03:04 AM, bhuvan Mitra wrote:

Hello,

I have a table with 12 columns and 20 Million rows. While writing the 
table I do not find any problem but when reading that I have some 
issues faced. When I perform a 'select * from table limit 1400;' 
(selecting 14million rows), it is working fine. If the limit value is 
1500, it is throwing the error as 'out of memory'.


If the query is 'select * from table' , The process is getting killed 
by displaying the message 'killed'.


Kindly tell me where it is going wrong. I have 6MB cache, 1.6GHz CPU, 
linux 14.04 OS, 8GB RAM.






You should be using a cursor.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] pg bouncer issue what does sv_used column means

2015-06-12 Thread Andrew Dunstan


Please do not cross-post on the PostgreSQL lists. Pick the most 
appropriate list to post to and just post there.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Fastest Backup Restore for perf testing

2015-05-27 Thread Andrew Dunstan


On 05/27/2015 04:24 PM, Wes Vaske (wvaske) wrote:


Hi,

I’m running performance tests against a PostgreSQL database (9.4) with 
various hardware configurations and a couple different benchmarks 
(TPC-C  TPC-H).


I’m currently using pg_dump and pg_restore to refresh my dataset 
between runs but this process seems slower than it could be.


Is it possible to do a tar/untar of the entire /var/lib/pgsql tree as 
a backup  restore method?


If not, is there another way to restore a dataset more quickly? The 
database is dedicated to the test dataset so trashing  rebuilding the 
entire application/OS/anything is no issue for me—there’s no data for 
me to lose.


Thanks!




Read all of this chapter. 
http://www.postgresql.org/docs/current/static/backup.html


cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL disk fragmentation causes performance problems on Windows

2015-04-29 Thread Andrew Dunstan


On 04/29/2015 10:35 AM, k...@rice.edu wrote:

On Wed, Apr 29, 2015 at 07:07:04AM -0700, Joshua D. Drake wrote:

On 04/29/2015 01:08 AM, Andres Freund wrote:


Which OS and filesystem is this done on? Because many halfway modern
systems, like e.g ext4 and xfs, implement this in the background as
'delayed allocation'.

Oh, it's in the subject. Stupid me, sorry for that. I'd consider testing
how much better this behaves under a different operating system, as a
shorter term relief.

This is a known issue on the Windows platform. It is part of the
limitations of that environment. Linux/Solaris/FreeBSD do not suffer
from this issue in nearly the same manner.

jD


You might consider a CLUSTER or VACUUM FULL to re-write the table with
less fragmentation.



Or pg_repack if you can't handle the lockup time that these involve.

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Number of Columns and Update

2014-12-22 Thread Andrew Dunstan


On 12/22/2014 03:53 PM, Robert DiFalco wrote:
This may fall into the category of over-optimization but I've become 
curious.


I have a user table with about 14 columns that are all 1:1 data - so 
they can't be normalized.


When I insert a row all columns need to be set. But when I update, I 
sometimes only update 1-2 columns at a time. Does the number of 
columns impact update speed?


For example:
 UPDATE users SET email = ? WHERE id = ?;

I can easily break this up into logical tables like user_profile, 
user_credential, user_contact_info, user_summary, etc with each table 
only having 1-4 columns. But with the multiple tables I would often be 
joining them to bring back a collection of columns.


I know I'm over thinking this but I'm curious of what the performance 
trade offs are for breaking up a table into smaller logically grouped 
tables.





An update rewrites the whole row, not just the updated columns.

I think you are overthinking it.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [SQL] querying with index on jsonb slower than standard column. Why?

2014-12-12 Thread Andrew Dunstan


On 12/12/2014 04:44 PM, Tom Lane wrote:

Josh Berkus j...@agliodbs.com writes:

Yeah, I believe the core problem is that Postgres currently doesn't have
any way to have variadic return times from a function which don't match
variadic input types.  Returning a value as an actual numeric from JSONB
would require returning a numeric from a function whose input type is
text or json.  So a known issue but one which would require a lot of
replumbing to fix.

Well, it'd be easy to fix if we were willing to invent distinct operators
depending on which type you wanted out (perhaps - for text output as
today, add -# for numeric output, etc).


That was my immediate reaction. Not sure about the operator name. I'd 
tentatively suggest -# (taking an int or text argument) and ## taking 
a text[] argument, both returning numeric, and erroring out if the value 
is a string, boolean, object or array.




   Doesn't seem terribly nice
from a usability standpoint though.

The usability issue could be fixed by teaching the planner to fold a
construct like (jsonb - 'foo')::numeric into (jsonb -# 'foo').
But I'm not sure how we do that except in a really ugly and ad-hoc
fashion.





I would be inclined to add the operator and see how cumbersome people 
find it. I suspect in many cases it might be sufficient.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Re: [SQL] querying with index on jsonb slower than standard column. Why?

2014-12-12 Thread Andrew Dunstan


On 12/12/2014 08:20 PM, Tom Lane wrote:

We can't just add the operator and worry about usability later;
if we're thinking we might want to introduce such an automatic
transformation, we have to be sure the new operator is defined in a
way that allows the transformation to not change any semantics.
What that means in this case is that if (jsonb - 'foo')::numeric
would have succeeded, (jsonb -# 'foo') has to succeed; which means
it'd better be willing to attempt conversion of string values to
numeric, not just throw an error on sight.




Well, I'm not 100% convinced about the magic transformation being a good 
thing.


Json numbers are distinct from strings, and part of the justification 
for this is to extract a numeric datum from jsonb exactly as stored, on 
performance grounds. So turning round now and making that turn a string 
into a number if possible seems to me to be going in the wrong direction.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres does not use indexes with OR-conditions

2014-11-07 Thread Andrew Dunstan


On 11/07/2014 12:06 AM, Vlad Arkhipov wrote:

It was just a minimal example. The real query looks like this.

select *
from commons.financial_documents fd
where fd.creation_time  '2011-11-07 10:39:07.285022+08'
  or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and 
fd.financial_document_id  100)

order by fd.creation_time desc
limit 200

I need to rewrite it in the way below to make Postgres use the index.

select *
from commons.financial_documents fd
where fd.creation_time = '2011-11-07 10:39:07.285022+08'
  and (
fd.creation_time  '2011-11-07 10:39:07.285022+08'
  or (fd.creation_time = '2011-11-07 10:39:07.285022+08' and 
fd.financial_document_id  100)

  )
order by fd.creation_time desc
limit 200



First, please do not top-post on the PostgreSQL lists. See 
http://idallen.com/topposting.html


Second, the last test for fd.creation_time in your query seems 
redundant. Could you not rewrite it as something this?:


   where fd.creation_time = '2011-11-07 10:39:07.285022+08'
and (fd.creation_time  '2011-11-07 10:39:07.285022+08'
 or fd.financial_document_id  100)

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] assignment vs SELECT INTO

2014-11-03 Thread Andrew Dunstan


I found out today that direct assignment to a composite type is (at 
least in my test) about 70% faster than setting it via SELECT INTO. That 
seems like an enormous difference in speed, which I haven't really been 
able to account for.


Test case:

   andrew=# \d abc
  Table public.abc
 Column |  Type   | Modifiers
   +-+---
 x  | text|
 y  | text|
 z  | integer |
   andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop
   select 'a','b',i into r.x,r.y,r.z; end loop; end; $x$;
   DO
   Time: 63731.434 ms
   andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop r
   := ('a','b',i); end loop; end; $x$;
   DO
   Time: 18744.151 ms


Is it simply because the SELECT is in effect three assignments, so it 
takes nearly 3 times as long?


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] assignment vs SELECT INTO

2014-11-03 Thread Andrew Dunstan


On 11/03/2014 03:24 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

 andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop
 select 'a','b',i into r.x,r.y,r.z; end loop; end; $x$;
 DO
 Time: 63731.434 ms
 andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop r
 := ('a','b',i); end loop; end; $x$;
 DO
 Time: 18744.151 ms
Is it simply because the SELECT is in effect three assignments, so it
takes nearly 3 times as long?

I think it's more likely that the second example is treated as a simple
expression so it has less overhead than a SELECT.





Well, I accidetally left out this case:

   andrew=# do $x$ declare r abc; begin for i in 1 .. 1000 loop
   select row('a','b',i) into r; end loop; end; $x$;
   DO
   Time: 81919.721 ms


which is slower still.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query with large number of joins

2014-10-21 Thread Andrew Dunstan


On 10/21/2014 12:09 PM, Marco Di Cesare wrote:

I did not mean to imply this works any better on SQL Server. We never tried. I 
just meant to say this is the first time we are using Postgres so we don't have 
much experience with it.

We tried with work_mem set to 1GB (even as high as 3GB) but it didn't change 
the GroupAggregate and Sort or query run time.

Sorry, I had to sanitize the query and a few of the relevant tables so 
hopefully I got it all right.

SELECT
 foxtrot_india.juliet_alpha,
 foxtrot_india.foxtrot_yankee,
 foxtrot_india.hotel_sierra,
 foxtrot_india.juliet_alpha,
foxtrot_india.bravo_romeo,
 oscar_bravo.golf_foxtrot,
 seven_kilo.november_lima,
 foxtrot_india.echo_six,
 uniform_six.seven_six,
oscar_charlie.foxtrot_charlie,
 COUNT(DISTINCT foxtrot_india.bravo_romeo)
FROM
 public.seven_kilo seven_kilo
 INNER JOIN public.papa_sierra papa_sierra ON (seven_kilo.golf_bravo = 
papa_sierra.golf_bravo)
 LEFT JOIN public.golf_two golf_two ON (seven_kilo.lima = 
golf_two.lima)
 LEFT JOIN public.bravo_xray bravo_xray ON (seven_kilo.lima = 
bravo_xray.lima)
 LEFT JOIN public.foo1 foo1 ON ((seven_kilo.bar1 = foo1.bar1) AND 
(seven_kilo.golf_bravo = foo1.golf_bravo))
 INNER JOIN public.oscar_charlie oscar_charlie ON (seven_kilo.lima = 
oscar_charlie.lima)
 INNER JOIN public.oscar_bravo oscar_bravo ON (oscar_charlie.foxtrot_four 
= oscar_bravo.foxtrot_four)
 INNER JOIN public.foxtrot_india foxtrot_india ON (oscar_bravo.sierra = 
foxtrot_india.sierra)
 INNER JOIN public.hotel_romeo hotel_romeo ON 
(oscar_charlie.foxtrot_charlie = hotel_romeo.foxtrot_charlie)
 INNER JOIN public.uniform_six uniform_six ON (hotel_romeo.hotel_lima = 
uniform_six.hotel_lima)
 LEFT JOIN public.lookup foo2 ON (foxtrot_india.bar2 = 
foo2.lookup_id)
 LEFT JOIN public.uniform_two uniform_two ON (foxtrot_india.sierra = 
uniform_two.sierra)
 INNER JOIN public.lookup four_xray ON (uniform_two.quebec = 
four_xray.quebec)
 LEFT JOIN public.papa_four papa_four ON (foxtrot_india.sierra = 
papa_four.sierra)
 INNER JOIN public.lookup romeo_bravo ON (papa_four.quebec = 
romeo_bravo.quebec)
 LEFT JOIN public.juliet_two juliet_two ON (foxtrot_india.sierra = 
juliet_two.sierra)
 INNER JOIN public.lookup four_delta ON (juliet_two.quebec = 
four_delta.quebec)
 LEFT JOIN public.foo3 foo3 ON (foxtrot_india.bar3 = 
foo3.bar3)
 INNER JOIN public.xray xray ON (seven_kilo.lima = 
xray.lima)
 INNER JOIN public.romeo_echo romeo_echo ON (xray.echo_sierra = 
romeo_echo.echo_sierra)
WHERE
 (((xray.echo_sierra = 'november_foxtrot')
 AND (romeo_echo.hotel_oscar = 'zulu')
 AND (oscar_charlie.five = 6)
 AND (oscar_charlie.whiskey = 'four_romeo')
 AND (oscar_charlie.charlie_romeo = 2014)))
GROUP BY 1, 2, 3, 4, 5, 6, 7, 8, 9, 10



Please don't top-post on the PostgreSQL lists. See 
http://idallen.com/topposting.html


Have you tried a) either turning off geqo or setting geqo_threshold 
fairly high b) setting join_collapse_limit fairly high (assuming all the 
above join targets are tables and not views, setting it to something 
like 25 should do the trick.


You also haven't told us what settings you have for things like 
effective_cache_size, which can dramatically affect query plans.


cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] char(N), varchar(N), varchar, text

2014-10-08 Thread Andrew Dunstan


On 10/08/2014 10:22 AM, Emi Lu wrote:

Good morning,

For performance point of view, are there big differences between: 
char(N), varchar(N), varchar, text?


Some comments from google shows:
No difference, under the hood it's all varlena. Check this article 
from Depesz: 
http://www.depesz.com/index.php/2010/03/02/charx-vs-varcharx-vs-varchar-vs-text/

A couple of highlights:

To sum it all up:

  * char(n) – takes too much space when dealing with values
shorter than n, and can lead to subtle errors because of
adding trailing spaces, plus it is problematic to change the limit
  * varchar(n) – it's problematic to change the limit in live
environment
  * varchar – just like text
  * text – for me a winner – over (n) data types because it lacks
their problems, and over varchar – because it has distinct name

So, can I assume no big performance differences?
Thanks alot!
Emi




Why do you need to ask if you already have the answer? Depesz is right.

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] help: function failing

2014-10-07 Thread Andrew Dunstan


On 10/07/2014 04:41 PM, Sergey Konoplev wrote:

On Thu, Oct 2, 2014 at 4:00 PM, George Neuner gneun...@comcast.net wrote:

--- code 
CREATE OR REPLACE FUNCTION gen_random()
   RETURNS double precision AS
$BODY$
DECLARE
num   float8 := 0;
den   float8 := 281474976710655; -- 0x
bytes bytea[6];
BEGIN
-- get random bytes from crypto module
bytes := ext.gen_random_bytes(6);

-- assemble a double precision value
num := num + get_byte( bytes, 0 );
FOR i IN 1..5 LOOP
   num := num * 256;
   num := num + get_byte( bytes, i );
   END LOOP;

   -- normalize value to range 0.0 .. 1.0
   RETURN num / den;
END;
$BODY$
   LANGUAGE plpgsql VOLATILE;
--- code 

The error is:
ERROR: array value must start with { or dimension information
SQL state: 22P02
Context: PL/pgSQL function gen_random() line 8 at assignment

which, if I'm counting correctly, is
bytes := ext.gen_random_bytes(6);

Guessing on the name of ext.gen_random_bytes(6) it returns a value
that is incompatible with bytea[] array representation time from time,
so take a closer look at ext.gen_random_bytes() first. You can test
the case using DO block.


If I comment out that line, it then tells me  get_byte()   is undefined,
which should be impossible because it's built in.

Feels like somewhere inside ext.gen_random_bytes() you set a
search_path that allows to see get_byte() and the search_path that was
set before the gen_random() call doesn't allow it.



Why does this code want an array of byteas?

It looks like the code thinks bytea[6] is a declaration of a bytea of 
length 6, which of course it is not. Shouldn't it just be declared as:


bytes bytea;

?


Oh, and pgsql-performance is completely the wrong forum for this query. 
usage questions should be on pgsql-general.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] postgres 9.3 vs. 9.4

2014-09-18 Thread Andrew Dunstan


On 09/18/2014 03:09 PM, Mkrtchyan, Tigran wrote:


- Original Message -

From: Josh Berkus j...@agliodbs.com
To: pgsql-performance@postgresql.org
Sent: Thursday, September 18, 2014 7:54:24 PM
Subject: Re: [PERFORM] postgres 9.3 vs. 9.4

On 09/18/2014 08:09 AM, Mkrtchyan, Tigran wrote:

9.4beta2:
...


 0.957854END;


Looks like IO.

Postgres internal IO? May be. We get 600MB/s on this SSDs.

While it's possible that this is a Postgres issue, my first thought is
that the two SSDs are not actually identical.  The 9.4 one may either
have a fault, or may be mostly full and heavily fragmented.  Or the Dell
PCIe card may have an issue.


We have tested both SSDs and they have identical IO characteristics and
as I already mentioned, both databases are fresh, including filesystem.


You are using scale 1 which is a  1MB database, and one client and 1
thread, which is an interesting test I wouldn't necessarily have done
myself.  I'll throw the same test on one of my machines and see how it does.

this scenario corresponds to our use case. We need a high transaction rate
per for a single client. Currently I can get only ~1500 tps. Unfortunately,
posgtress does not tell me where the bottleneck is. Is this is defensively
not the disk IO.






This is when you dig out tools like perf, maybe.

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tuning Postgres for Single connection use

2014-04-14 Thread Andrew Dunstan


On 04/14/2014 05:46 PM, Nick Eubank wrote:


Any rules of thumb for |work_mem|, |maintenance_work_mem|, 
|shared_buffer|, etc. for a database that DOESN'T anticipate 
concurrent connections and that is doing lots of aggregate functions 
on large tables? All the advice I can find online on tuning (this 
http://wiki.postgresql.org/wiki/Performance_Optimization, this 
http://media.revsys.com/talks/djangocon/2011/secrets-of-postgresql-performance.pdf, 
this 
http://www.revsys.com/writings/postgresql-performance.html etc.) is 
written for people anticipating lots of concurrent connections.


I'm a social scientist looking to use Postgres not as a database to be 
shared by multiple users, but rather as my own tool for manipulating a 
massive data set (I have 5 billion transaction records (600gb in csv) 
and want to pull out unique user pairs, estimate aggregates for 
individual users, etc.). This also means almost no writing, except to 
creation of new tables based on selections from the main table.


I'm on a Windows 8 VM with 16gb ram, SCSI VMware HD, and 3 cores if 
that's important.






First up would probably be don't run on Windows. shared_buffers above 
512Mb causes performance to degrade on Windows, while that threshold is 
much higher on *nix.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] query against large table not using sensible index to find very small amount of data

2014-04-08 Thread Andrew W. Gibbs
I have a fairly large table (~100M rows), let's call it events, and
among other things it has a couple of columns on it, columns that
we'll call entity_type_id (an integer) and and published_at (a
timestamp).  It has, among others, indices on (published_at) and
(entity_type_id, published_at).

A very common query against this table is of the form...

SELECT * FROM events WHERE entity_type_id = XXX ORDER BY published_at DESC 
LIMIT 25;

... to get the most recent 25 events from the table for a given type
of entity, and generally the query planner does the expected thing of
using the two-part index on (entity_type_id, published_at).  Every now
and again, though, I have found the query planner deciding that it
ought use the single column (published_at) index.  This can,
unsurprisingly, result in horrendous performance if events for a given
entity type are rare, as we end up with a very long walk of an index.

I had this happen again yesterday and I noticed something of
particular interest pertaining to the event.  Specifically, the query
was for an entity type that the system had only seen for the first
time one day prior, and furthermore the events table had not been
analyzed by the statistics collector for a couple of weeks.

My intuition is that the query planner, when working with an enormous
table, and furthermore encountering an entity type that the statistics
collector had never previously seen, would assume that the number of
rows in the events table of that entity type would be very small, and
therefore the two-part index on (entity_type_id, published_at) would
be the right choice.  Nonetheless, an EXPLAIN was showing usage of the
(published_at) index, and since there were only ~20 rows in the entire
events table for that entity type the queries were getting the worst
possible execution imaginable, i.e. reading in the whole table to find
the rows that hit, but doing it with the random I/O of an index walk.

As an experiment, I ran a VACUUM ANALYZE on the events table, and then
re-ran the EXPLAIN of the query, and...  Same query plan again...
Maybe for whatever issue I am having the random sampling nature of the
statistics collector made it unhelpful, i.e. in its sampling of the
~100M rows it never hit a single row that had the new entity type
specified?

Other possibly relevant pieces of information...  The entity type
column has a cardinality in the neighborhood of a couple dozen.
Meanwhile, for some of the entity types there is a large and ongoing
number of events, and for other entity types there is a smaller and
more sporadic number of events.  Every now and again a new entity type
shows up.

I can't understand why the query planner would make this choice.
Maybe it has gotten ideas into its head about the distribution of
data?  Or maybe there is a subtle bug that my data set is triggering?
Or maybe I need to turn some knobs on statistics collection?  Or maybe
it's all of these things together?  I worry that even if there is a
knob turning exercise that helps that we're still going to get burned
whenever a new entity type shows up until we re-run ANALYZE, assuming
that I can find a fix that involves tweaking statistics collection.  I
just can't fathom how it would ever be the case that Postgres's choice
of index usage in this case would make sense.  It doesn't even slot
cleanly into the problem space of why did Postgres do a sequential
scan instead of an index scan?.  If you're doing a query of the
described form and the entity type is specified, wouldn't the two-part
index theoretically _always_ yield better performance than the
one-part index?  Maybe I have a flawed understanding of the cost of
using various indexes?  Maybe there is something analogous between
sequential-versus-index-scan and one-part-versus-two-part-index scan
choices?

FWIW, we're running on 8.4.X and using the out-of-the-box
default_statistics_target setting and haven't dabbled with setting
table level statistics configurations.

Thoughts?  Recommended reading?

  -- AWG


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] query against large table not using sensible index to find very small amount of data

2014-04-08 Thread 'Andrew W. Gibbs'
Your understanding of the utility of multi-part indices does not jive
with my own.

While I agree that a partial index might be in order here, that ought
just be a performance optimization that lowers the footprint of the
index from an index size and index maintenance standpoint, not
something that governs when the index is used for an item whose entity
type rarely comes up in the table.  If a couple of the entity types
were to constitute 80% of the events, then using a partial index would
reduce the performance strain of maintaining the index by 80%, but
this ought not govern the query planner's behavior when doing queries
on entity types that were not among those.

My general understanding of the utility of multi-part indices is that
they will come into play when some number of the leading columns
appear in the query as fixed values and furthermore if a subsequent
column appears as part of a ranging operation.

I know that a b-tree structure isn't exactly the same as a
binary-tree, but it is roughly equivalent for the purposes of our
conversation...  I believe you can think of multi-part indices as
(roughly) equivalent either to nested binary trees, or as equivalent
to a binary tree whose keys are the concatenation of the various
columns.  In the former case, doing a range scan would be a matter of
hopping through the nested trees until you got to the terminal range
scan operation, and in the latter case doing a range scan would be a
matter of finding the first node in the tree that fell within the
values for your concatenation and then walking through the tree.  Yes,
that's not exactly what happens with a b-tree, but it's pretty
similar, the main differences being performance operations, I believe.

Given that, I don't understand how having a multi-part index with the
column over which I intend to range comes _earlier_ than the column(s)
that I intend to have be fixed would be helpful.  This is especially
true given that the timestamp columns are are the granularity of
_milliseconds_ and my data set sees a constant stream of inputs with
bursts up to ~100 events per second.  I think what you are describing
could only make sense if the date column were at a large granularity,
e.g hours or days.

Or maybe I have missed something...

  -- AWG

On Tue, Apr 08, 2014 at 01:39:41PM +, Shaun Thomas wrote:
 
  Other possibly relevant pieces of information...  The entity type
  column has a cardinality in the neighborhood of a couple dozen.
  Meanwhile, for some of the entity types there is a large and ongoing
  number of events, and for other entity types there is a smaller and
  more sporadic number of events.  Every now and again a new entity
  type shows up.
 
 With that as the case, I have two questions for you:
 
 1. Why do you have a low cardinality column as the first column in an index?
 2. Do you have any queries at all that only use the entity type as the only 
 where clause?
 
 I agree that the planner is probably wrong here, but these choices aren't 
 helping. The low cardinality of the first column causes very large buckets 
 that don't limit results very well at all. Combined with the order-by clause, 
 the planner really wants to walk the date index backwards to find results 
 instead. I would do a couple of things.
 
 First, remove the type/date index. Next, do a count of each type in the table 
 with something like this:
 
 SELECT type_id, count(1)
   FROM my_table
  GROUP BY 2
 
 Any type that is more than 20% of the table will probably never be useful in 
 an index. At this point, you have a choice. You can create a new index with 
 date and type *in that order* or create a new partial index on date and type 
 that also ignores the top matches. For instance, if you had a type that was 
 90% of the values, this would be my suggestion:
 
 CREATE INDEX idx_foo_table_date_event_type_part ON foo_table (event_date, 
 event_type)
  WHERE event_type != 14;
 
 Or whatever. If the IDs are basically evenly distributed, it won't really 
 matter.
 
 In any case, index order matters. The planner wants to restrict data as 
 quickly as possible. If you provide an order clause, it wants to read the 
 index in that order. Your specified type as the first column disrupts that, 
 so it has to fetch the values first, which is usually more expensive. Even if 
 that's wrong in your particular case, planner stats are not precise enough to 
 know that.
 
 Either way, try moving the indexes around. I can't think of many indexes in 
 our database where I have the low cardinality value as the first column. 
 Databases have an easier time managing many shallow buckets of values, than a 
 few deep ones.
 
 --
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd | Suite 400 | Chicago IL, 60604
 312-676-8870
 stho...@optionshouse.com
 
 __
 
 See http://www.peak6.com/email_disclaimer/ for terms and conditions related 
 to this email
 
 
 -- 
 Sent via pgsql-performance mailing 

Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Andrew Dunstan


On 12/28/2013 12:19 AM, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing of multi 
tenant application. In case of single tenant it is working fine but 
once I enable tenants, then some time database servers not responding. 
Any clue?







I usually use the term multi-tenancy to refer to different postgres 
instances running on the same machine, rather than different databases 
within a single instance of postgres. So lease describe your setup in 
more detail.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are there some additional postgres tuning to improve performance in multi tenant system

2013-12-28 Thread Andrew Dunstan


On 12/28/2013 08:46 AM, ankush upadhyay wrote:
On Sat, Dec 28, 2013 at 6:50 PM, Andrew Dunstan and...@dunslane.net 
mailto:and...@dunslane.net wrote:



On 12/28/2013 12:19 AM, ankush upadhyay wrote:

Hello All,

I am using multi tenant system and doing performance testing
of multi tenant application. In case of single tenant it is
working fine but once I enable tenants, then some time
database servers not responding. Any clue?





I usually use the term multi-tenancy to refer to different
postgres instances running on the same machine, rather than
different databases within a single instance of postgres. So lease
describe your setup in more detail.

cheers

andrew



First of all Thanks Andrew for let me know email etiquette and 
extremely sorry for confusion.


Here I meant to say that  different postgres instances running on the 
same machine.


Actually I have one application machine and one database server 
machine with multiple postgres instances running on it and accessing 
by application server.


I hope this time I could explain it in more details.





Why are you doing that, as opposed to running multiple databases in a 
single instance? Running more than a handful of instances in a single 
machine is almost always a recipe for poor performance. The vast 
majority of users in my experience run a single postgres instance per 
machine, possibly with a large number of databases.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Current query of the PL/pgsql procedure.

2013-12-16 Thread Andrew Dunstan


On 12/16/2013 05:26 AM, hubert depesz lubaczewski wrote:

On Sun, Dec 15, 2013 at 04:18:18PM +, Yuri Levinsky wrote:

Dear ALL,
I am running PL/pgsql procedure with sql statements that taking a long
time. I able to see them in the log just after their completion. How
can I see currently running SQL statement?  I am able to see in
pg_stat_activity only my call to function. Many thanks in advance.

pg_stat_activity and pg logs, can't see what your function does
internally.

What you can do, though, is to add some RAISE LOG to the function, so
that it will log its progress.

Check this for example:
http://www.depesz.com/2010/03/18/profiling-stored-proceduresfunctions/



Also, the auto-explain module can peer inside functions. See 
http://www.postgresql.org/docs/current/static/auto-explain.html


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Andrew Dunstan


On 11/26/2013 09:26 AM, Craig James wrote:


On 25.11.2013 22:01, Lee Nguyen wrote:


Why shouldn't we run Postgres in a VM?  What are the
downsides? Does anyone
have any metrics or benchmarks with the latest Postgres?


For those of us with small (a few to a dozen servers), we'd like to 
get out of server maintenance completely. Can anyone with experience 
on a cloud VM solution comment?  Do the VM solutions provided by the 
major hosting companies have the same good performance as the VM's 
that that several have described here?


Obviously there's Amazon's new Postgres solution available.  What else 
is out there in the way of instant on solutions with 
Linux/Postgres/Apache preconfigured systems?  Has anyone used them in 
production?





If you want a full stack including Postgres, Heroku might be your best 
bet. Depends a bit on your application and your workload. And yes, I've 
used it. Full disclosure: I have done work paid for by Heroku.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-26 Thread Andrew Dunstan


On 11/26/2013 08:51 AM, Boszormenyi Zoltan wrote:

2013-11-25 21:19 keltezéssel, Heikki Linnakangas írta:

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. 
That

bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres 
database
on our VM farm alongside our application vm's.  We are planning to 
run a

few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does 
anyone

have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform 
worse in a VM. In the performance testing that we've done in VMware, 
though, we haven't seen any big impact. So I guess the answer is that 
it depends on the specific configuration of CPU, memory, disks and 
the software.


We at Cybertec tested some configurations about 2 months ago.
The performance drop is coming from the disk given to the VM guest.

When there is a dedicated disk (pass through) given to the VM guest,
PostgreSQL runs at a speed of around 98% of the bare metal.

When the virtual disk is a disk file on the host machine, we've measured
20% or lower. The host used Fedora 19/x86_64 with IIRC a 3.10.x Linux 
kernel
with EXT4 filesystem (this latter is sure, not IIRC). The effect was 
observed

both under Qemu/KVM and Xen.

The virtual disk was not pre-allocated, since it was the default setting,
i.e. space savings preferred over speed. The figure might be better with
a pre-allocated disk but the filesystem journalling done twice (both 
in the

host and the guest) will have an effect.



Not-pre-allocated disk-file backed is just about the worst case in my 
experience.


Try pre-allocated VirtIO disks on an LVM volume group - you should get 
much better performance.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgresql in a Virtual Machine

2013-11-25 Thread Andrew Dunstan


On 11/25/2013 03:19 PM, Heikki Linnakangas wrote:

On 25.11.2013 22:01, Lee Nguyen wrote:

Hi,

Having attended a few PGCons, I've always heard the remark from a few
presenters and attendees that Postgres shouldn't be run inside a VM. 
That

bare metal is the only way to go.

Here at work we were entertaining the idea of running our Postgres 
database

on our VM farm alongside our application vm's.  We are planning to run a
few Postgres synchronous replication nodes.

Why shouldn't we run Postgres in a VM?  What are the downsides? Does 
anyone

have any metrics or benchmarks with the latest Postgres?


I've also heard people say that they've seen PostgreSQL to perform 
worse in a VM. In the performance testing that we've done in VMware, 
though, we haven't seen any big impact. So I guess the answer is that 
it depends on the specific configuration of CPU, memory, disks and the 
software. Synchronous replication is likely going to be the biggest 
bottleneck by far, unless it's mostly read-only. I don't know if 
virtualization will have a measurable impact on network latency, which 
is what matters for synchronous replication.


So, I'd suggest that you try it yourself, and see how it performs. And 
please report back to the list, I'd also love to see some numbers!






Yeah, and there are large numbers of public and/or private cloud-based 
offerings out there (from Amazon RDS, Heroku, EnterpriseDB and VMware 
among others.) Pretty much all of these are VM based, and can be 
suitable for many workloads.


Maybe the advice is a bit out of date.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow sort

2013-09-11 Thread Andrew Dunstan
 by a query generator. But a good generator should not 
just say grouo by everything that's not aggregated and think it's 
doing a good job. In your case it should be relatively straightforward.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance bug in prepared statement binding in 9.2?

2013-09-11 Thread Andrew Dunstan


On 09/11/2013 02:35 PM, Josh Berkus wrote:

All,

We've confirmed that this issue is caused by having long-running idle
transactions on the server.  When we disabled their queueing system
(which prodiced hour-long idle txns), the progressive slowness went away.

Why that should affect 9.X far more strongly than 8.4, I'm not sure
about.  Does that mean that 8.4 was unsafe, or that this is something
which *could* be fixed in later versions?

I'm also confused as to why this would affect BIND time rather than
EXECUTE time.




One thing that this made me wonder is why we don't have 
transaction_timeout, or maybe transaction_idle_timeout.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance bug in prepared statement binding in 9.2?

2013-09-10 Thread Andrew Dunstan


On 09/10/2013 08:20 AM, Andres Freund wrote:


A backtrace for this would be useful. Alternatively you could recompile
postgres using -fno-omit-frame-pointer in CFLAGS and use perf record -g.


It's using a custom build, so this should be doable.



Any chance you have older prepared xacts, older sessions or something
like that around? I'd expect heap_prune* to be present in workloads that
spend significant time in heap_hot_search_buffer...



Not sure about prepared transactions. There are certainly probably old 
prepared statements around, and long running transactions alongside this 
one.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance bug in prepared statement binding in 9.2?

2013-09-10 Thread Andrew Dunstan


On 09/10/2013 09:23 AM, Andres Freund wrote:

On 2013-09-10 15:21:33 +0200, Andres Freund wrote:

If I interpret things correctly you're using serializable? I guess there
is no chance to use repeatable read instead?

Err, that wouldn't help much. Read committed. That lets PGXACT-xmin advance
these days and thus might help to reduce the impact of the longrunning
transactions.
Otherwise you will have to shorten those...




Yeah, we're looking at eliminating them.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Intermittent hangs with 9.2

2013-09-10 Thread Andrew Dunstan


On 09/10/2013 11:04 AM, David Whittaker wrote:


Hi All,

I've been seeing a strange issue with our Postgres install for about a 
year now, and I was hoping someone might be able to help point me at 
the cause. At what seem like fairly random intervals Postgres will 
become unresponsive to the 3 application nodes it services. These 
periods tend to last for 10 - 15 minutes before everything rights 
itself and the system goes back to normal.


During these periods the server will report a spike in the outbound 
bandwidth (from about 1mbs to about 5mbs most recently), a huge spike 
in context switches / interrupts (normal peaks are around 2k/8k 
respectively, and during these periods they‘ve gone to 15k/22k), and a 
load average of 100+. CPU usage stays relatively low, but it’s all 
system time reported, user time goes to zero. It doesn‘t seem to be 
disk related since we’re running with a shared_buffers setting of 24G, 
which will fit just about our entire database into memory, and the IO 
transactions reported by the server, as well as the disk reads 
reported by Postgres stay consistently low.


We‘ve recently started tracking how long statements take to execute, 
and we’re seeing some really odd numbers. A simple delete by primary 
key, for example, from a table that contains about 280,000 rows, 
reportedly took 18h59m46.900s. An update by primary key in that same 
table was reported as 7d 17h 58m 30.415s. That table is frequently 
accessed, but obviously those numbers don't seem reasonable at all.


Some other changes we've made to postgresql.conf:

synchronous_commit = off

maintenance_work_mem = 1GB
wal_level = hot_standby
wal_buffers = 16MB

max_wal_senders = 10

wal_keep_segments = 5000

checkpoint_segments = 128

checkpoint_timeout = 30min

checkpoint_completion_target = 0.9

max_connections = 500

The server is a Dell Poweredge R900 with 4 Xeon E7430 processors, 48GB 
of RAM, running Cent OS 6.3.


So far we‘ve tried disabling Transparent Huge Pages after I found a 
number of resources online that indicated similar interrupt/context 
switch issues, but it hasn’t resolve the problem. I managed to catch 
it happening once and run a perf which showed:


|
+  41.40%   48154  postmaster  0x347ba9 f 0x347ba9
+   9.55%   10956  postmaster  0x2dc820 f set_config_option
+   8.64%9946  postmaster  0x5a3d4  f writeListPage
+   5.75%6609  postmaster  0x5a2b0  f ginHeapTupleFastCollect
+   2.68%3084  postmaster  0x192483 f 
build_implied_join_equality
+   2.61%2990  postmaster  0x187a55 f build_paths_for_OR
+   1.86%2131  postmaster  0x794aa  f get_collation_oid
+   1.56%1822  postmaster  0x5a67e  f ginHeapTupleFastInsert
+   1.53%1766  postmaster  0x1929bc f distribute_qual_to_rels
+   1.33%1558  postmaster  0x249671 f cmp_numerics|

I‘m not sure what 0x347ba9 represents, or why it’s an address rather 
than a method name.


That's about the sum of it. Any help would be greatly appreciated and 
if you want any more information about our setup, please feel free to ask.





I have seen cases like this with very high shared_buffers settings.

24Gb for shared_buffers is quite high, especially on a 48Gb box. What 
happens if you dial that back to, say, 12Gb?


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance bug in prepared statement binding in 9.2?

2013-09-09 Thread Andrew Dunstan
   [.] 0x5898
   +   0.30%  postgres   [.] LockAcquireExtended
   +   0.30%  postgres   [.] _bt_first
   +   0.29%  postgres   [.] add_paths_to_joinrel
   +   0.28%  postgres   [.] MemoryContextCreate
   +   0.28%  postgres   [.] appendBinaryStringInfo
   +   0.28%  postgres   [.] MemoryContextStrdup
   +   0.27%  postgres   [.] heap_hot_search_buffer
   +   0.27%  postgres   [.] GetSnapshotData
   +   0.26%  postgres   [.] hash_search
   +   0.26%  postgres   [.] heap_getsysattr
   +   0.26%  [vdso] [.] 0x7fff681ff70c
   +   0.25%  postgres   [.] compare_scalars
   +   0.25%  postgres   [.] pg_verify_mbstr_len



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Varchar vs foreign key vs enumerator - table and index size

2013-09-02 Thread Andrew Dunstan


On 09/02/2013 05:53 AM, Łukasz Walkowski wrote:

On 1 wrz 2013, at 05:10, Vitalii Tymchyshyn tiv...@gmail.com wrote:


Well, there are some more options:
a) Store int keys and do mapping in the application (e.g. with java enums). 
This can save you a join, that is especially useful if you are going to do 
paged output with limit/offset scenario. Optimizer sometimes produce suboptimal 
plans for join in offset/limit queries.
b) Store small varchar values as keys (up to char type if you really want to 
save space) and do user display mapping in application. It's different from (a) since 
it's harder to mess with the mapping and values are still more or less readable with 
simple select. But it can be less efficient than (a).
c) Do mixed approach with mapping table, loaded on start into application 
memory. This would be an optimization in case you get into optimizer troubles.

Best regards, Vitalii Tymchyshyn

I'd like to leave database in readable form because before I add some new 
queries and rest endpoints to the application, I test them as ad-hoc queries 
using command line. So variant a) isn't good for me. Variant b) is worth trying 
and c) is easy to code, but I still prefer having all this data in database 
independent of application logic.




I think the possible use of Postgres enums has been too easily written 
off in this thread. Looking at the original problem description they 
look like quite a good fit, despite the OP's skepticism. What exactly is 
wanted that can't be done with database enums? You can add new values to 
the type very simply.  You can change the values of existing labels in 
the type slightly less simply, but still without any great difficulty. 
Things that are hard to do include removing labels in the set and 
changing the sort order, because those things would require processing 
tables where the type is used, unlike the simple things. But neither of 
these is required for typical use cases. For most uses of this kind they 
are very efficient both in storage and processing.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Hstore VS. JSON

2013-07-16 Thread Andrew Dunstan


On 07/16/2013 11:05 AM, Niels Kristian Schjødt wrote:

Hi,

I'm in the process of implementing a table for storing some raw data in the 
format of a hash containing one level of keys and values. The hash can be quite 
big (up to 50 keys pointing at values varying from one to several hundred 
characters)

Now, I'm in doubt whether to use JSON or Hstore for this task. Here is the 
facts:

- I'm not going to search a lot (if any) in the data stored in the column, i'm 
only going to load it out.
- The data is going to be heavily updated (not only inserted). Keys and values 
are going to be added/overwritten quite some times.



In both cases, each hstore/json is a single datum, and updating it means 
writing out the whole datum - in fact the whole row containing the datum.



- My database's biggest current issue is updates, so i don't want that to be a 
bottle neck.
- I'm on postgresql 9.2

So, question is: Which will be better performance wise, especially for updates? 
Does the same issues with updates on the MVCC structure apply to updates in 
Hstore? What is taking up most space on the HDD?



MVCC applies to all updates on all kinds of data. Hstore and JSON are 
not different in  this respect.


You should test the storage effects with your data. On 9.2 for your data 
hstore might be a better bet, since in 9.2 hstore has more operators 
available natively.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Process in state BIND, authentication, PARSE

2013-07-08 Thread Andrew Dunstan


On 07/08/2013 12:22 PM, Jeison Bedoya wrote:
Hi, i want to know why in my database the process stay in BID, PARSE, 
autentication, startup by a couple minuts, generating slow in the 
process, perhaps tunning parameters? or configuration of operating 
system (Linux RHEL 6).






You haven't given us nearly enough information about your setup. We'd 
need to see your configuration settings and have some details of the 
machine and where connections are coming from to diagnose it further.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Dynamic queries in stored procedure

2013-07-05 Thread Andrew Dunstan


On 07/05/2013 09:57 AM, Greg Jaskiewicz wrote:

Hey,

We have a search method that depending on search params will join 3-5 tables, 
craft the joins and where section. Only problem is, this is done in rather 
horrible java code. So using pgtap for tests is not feasible.
I want to move the database complexity back to database, almost writing the 
query construction in the plpgsql or python as stores procedure, any 
suggestions ?

Unfortunately PostgreSQL won't eliminate unnecessary joins from a view, so I 
can't just create one view and simple code adding where's, order by, etc.

No, I don't want to use orm.



It's a matter of taste. Pretty much every PL has facilities for 
constructing and running dynamic sql. PLPgsql ,PLPerl, PLV8 ...


cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Partitions not Working as Expected

2013-06-27 Thread Andrew Dunstan


On 06/27/2013 03:14 PM, Shaun Thomas wrote:

On 06/27/2013 01:42 PM, Tom Lane wrote:


That will break things: CURRENT_DATE will then be equivalent to just
writing today's date as a literal.


Interesting. I tested it by creating a view and a table with a 
default, and it always seems to get translated to:


('now'::text)::date

But I'll take your explanation at face value, since that doesn't imply 
what the output would be. What's interesting is that EnterpriseDB has 
their own pg_catalog.current_date function that gets called by the 
CURRENT_DATE keyword. So unlike in vanilla PG, I could mark just the 
current_date function as immutable without affecting a lot of other 
internals.


On EDB, this actually works:

UPDATE pg_proc
   SET provolatile = 'i'
 WHERE proname = 'current_date';



But that's a lie, surely. If it breaks you have nobody to blame but 
yourself. There's a reason EDB haven't marked their function immutable - 
it's not.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PostgreSQL settings for running on an SSD drive

2013-06-20 Thread Andrew Dunstan


On 06/20/2013 05:23 PM, Shaun Thomas wrote:

On 06/20/2013 03:32 PM, Josh Berkus wrote:


Did you compare setting RPC to 1.0 vs. setting it to 1.1, or something
else just slightly higher than SPC?


Yes, actually. My favored setting when we were on 8.3 was 1.5. But 
something with the planner changed pretty drastically when we went to 
9.1, and we were getting some really bad query plans unless we 
*strongly* suggested RPC was cheap. I was afraid I'd have to go lower, 
but 1 seemed to do the trick.




That would be perverse, surely, but on Fusion-IO RPC = SPC seems to make 
sense unless you assume that cache misses will be higher for random 
reads than for sequential reads.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Thinking About Correlated Columns (again)

2013-05-15 Thread Andrew Dunstan


On 05/15/2013 12:23 PM, Craig James wrote:
On Wed, May 15, 2013 at 8:31 AM, Shaun Thomas 
stho...@optionshouse.com mailto:stho...@optionshouse.com wrote:


[Inefficient plans for correlated columns] has been a pain point
for quite a while. While we've had several discussions in the
area, it always seems to just kinda trail off and eventually
vanish every time it comes up.

...
Since there really is no fix for this aside from completely
rewriting the query or horribly misusing CTEs (to force sequence
scans instead of bloated nested loops, for example)...
I'm worried that without an easy answer for cases like this, more
DBAs will abuse optimization fences to get what they want and
we'll end up in a scenario that's actually worse than query hints.
Theoretically, query hints can be deprecated or have GUCs to
remove their influence, but CTEs are forever, or until the next
code refactor.

I've seen conversations on this since at least 2005. There were
even proposed patches every once in a while, but never any
consensus. Anyone care to comment?


It's a very hard problem.  There's no way you can keep statistics 
about all possible correlations since the number of possibilities is 
O(N^2) with the number of columns.




I don't see why we couldn't allow the DBA to specify some small subset 
of the combinations of columns for which correlation stats would be 
needed. I suspect in most cases no more than a handful for any given 
table would be required.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] template1 vacuuming consuming much time compared to another production DBs

2013-04-22 Thread Andrew Dunstan


On 04/22/2013 07:31 AM, pradeep singh wrote:

Hi,


I am using postgresql 8.1 DB.


Why are you using a release of Postgres that is way out of date and 
unsupported?


cheers

andrew





--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Speed of EXCECUTE in PL/PGSQL

2013-03-14 Thread Andrew Dunstan


On 03/14/2013 03:22 PM, Artur Zając wrote:


Why speed of executing (or planning) some very simple query from 
string in pl/pgsql is dependent from whole query or why “FOR r IN 
EXECUTE q” is significally slower from “FOR r IN query”?





The whole point of EXECUTE is that it's reparsed and planned each time. 
You should expect it to be quite a bit slower, and avoid using EXECUTE 
wherever possible.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage after partitioning

2013-01-22 Thread Andrew Dunstan


On 01/22/2013 09:21 AM, rudi wrote:
On Tue, Jan 22, 2013 at 3:04 PM, Merlin Moncure mmonc...@gmail.com 
mailto:mmonc...@gmail.com wrote:


let's see the query -- it's probably written in such a way so as to
not be able to be optimized through CE.


The query is pretty simple and standard, the behaviour (and the plan) 
is totally different when it comes to a partitioned table.


Partioned table query = explain analyze SELECT  sb_logs.* FROM 
sb_logs  WHERE (device_id = 901 AND date_taken = (SELECT 
MAX(date_taken) FROM sb_logs WHERE device_id = 901));




And there you have it. Constraint exclusion does not work in cases like 
this. It only works with static expressions (such as a literal date in 
this case).


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] High CPU usage after partitioning

2013-01-21 Thread Andrew Dunstan


On 01/21/2013 10:05 AM, rudi wrote:

Hello,

I'm running postgresl 9.0. After partitioning a big table, CPU usage 
raised from average 5-10% to average 70-80%.


- the table contains about 20.000.000 rows
- partitions are selected using a trigger, based on an indexed field, 
a date (IF date_taken = x AND date_taken  y)
- I created 5 partitions, the 2012 one now contains most of the rows. 
The 2013 partition is the live partition, mostly insert, a few 
select based on the above indexed field. The 2013, 2014, 2015 
partitions are empty

- constraint execution is on.
I have 2 weeks CPU usage reports and the pattern definately changed 
after I made the partitions. Any idea?





Well, the first question that comes to my mind is whether it's the 
inserts that are causing the load or the reads. If it's the inserts then 
you should show us the whole trigger. Does it by any chance use 'execute'?


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] backend suddenly becomes slow, then remains slow

2012-12-27 Thread Andrew Dunstan

On 12/26/2012 11:03 PM, Jeff Janes wrote:
On Fri, Dec 14, 2012 at 10:40 AM, Andrew Dunstan 
andrew.duns...@pgexperts.com wrote:
 One of my clients has an odd problem. Every so often a backend will 
suddenly
 become very slow. The odd thing is that once this has happened it 
remains
 slowed down, for all subsequent queries. Zone reclaim is off. There 
is no IO
 or CPU spike, no checkpoint issues or stats timeouts, no other 
symptom that

 we can see.

By no spike, do you mean that the system as a whole is not using an 
unusual amount of IO or CPU, or that this specific slow back-end is 
not using an unusual amount?



both, really.



Could you strace is and see what it is doing?



Not very easily, because it's a pool connection and we've lowered the 
pool session lifetime as part of the amelioration :-) So it's not 
happening very much any more.




 The problem was a lot worse that it is now, but two steps have
 alleviated it mostly, but not completely: much less aggressive 
autovacuuming
 and reducing the maximum lifetime of backends in the connection 
pooler to 30

 minutes.

Do you have a huge number of tables?  Maybe over the course of a 
long-lived connection, it touches enough tables to bloat the relcache 
/ syscache.  I don't know how the autovac would be involved in that, 
though.





Yes, we do indeed have a huge number of tables. This seems a plausible 
thesis.


cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] backend suddenly becomes slow, then remains slow

2012-12-14 Thread Andrew Dunstan
One of my clients has an odd problem. Every so often a backend will 
suddenly become very slow. The odd thing is that once this has happened 
it remains slowed down, for all subsequent queries. Zone reclaim is off. 
There is no IO or CPU spike, no checkpoint issues or stats timeouts, no 
other symptom that we can see. The problem was a lot worse that it is 
now, but two steps have alleviated it mostly, but not completely: much 
less aggressive autovacuuming and reducing the maximum lifetime of 
backends in the connection pooler to 30 minutes.


It's got us rather puzzled. Has anyone seen anything like this?

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] backend suddenly becomes slow, then remains slow

2012-12-14 Thread Andrew Dunstan

On 12/14/2012 02:56 PM, Tom Lane wrote:

Andrew Dunstan andrew.duns...@pgexperts.com writes:

One of my clients has an odd problem. Every so often a backend will
suddenly become very slow. The odd thing is that once this has happened
it remains slowed down, for all subsequent queries. Zone reclaim is off.
There is no IO or CPU spike, no checkpoint issues or stats timeouts, no
other symptom that we can see. The problem was a lot worse that it is
now, but two steps have alleviated it mostly, but not completely: much
less aggressive autovacuuming and reducing the maximum lifetime of
backends in the connection pooler to 30 minutes.
It's got us rather puzzled. Has anyone seen anything like this?

Maybe the kernel is auto-nice'ing the process once it's accumulated X
amount of CPU time?





That was my initial thought, but the client said not. We'll check again.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Why does the number of rows are different in actual and estimated.

2012-12-13 Thread Andrew Dunstan


On 12/13/2012 05:12 PM, AI Rumman wrote:

Why does the number of rows are different in actual and estimated?




Isn't that in the nature of estimates? An estimate is a heuristic guess 
at the number of rows it will find for the given query or part of a 
query. It's not uncommon for estimates to be out by several orders of 
magnitude. Guaranteeing estimates within bounded accuracy and in a given 
short amount of time (you don't want your planning time to overwhelm 
your execution time) isn't possible.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] track_activity_query_size

2012-12-12 Thread Andrew Dunstan
Is there a performance downside to setting track_activity_query_size to 
a significantly larger value than the default 1024 (say 10240), given 
that there's plenty of memory to spare?


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Read rows deleted

2012-12-12 Thread Andrew Dunstan


On 12/12/2012 03:24 PM, Alejandro Carrillo wrote:

Hi,

Anybody knows how to create a table using a table file? It isn't a 
fdw, is a file that compose the table in postgresql and get with the 
pg_relation_filepath function. Ex:


select pg_relation_filepath('pg_proc');

Anybody knows a JDBC or a multiplatform code that let read the delete 
rows of a table without writing of a table file?






This isn't a performance related question. Please ask on the correct 
list (probably pgsql-general).


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] encouraging index-only scans

2012-12-12 Thread Andrew Dunstan


A client is testing a migration from 9.1 to 9.2, and has found that a 
large number of queries run much faster if they use index-only scans. 
However, the only way he has found to get such a plan is by increasing 
the seq_page_cost to insanely high levels (3.5). Is there any approved 
way to encourage such scans that's a but less violent than this?


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] encouraging index-only scans

2012-12-12 Thread Andrew Dunstan


On 12/12/2012 04:32 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

A client is testing a migration from 9.1 to 9.2, and has found that a
large number of queries run much faster if they use index-only scans.
However, the only way he has found to get such a plan is by increasing
the seq_page_cost to insanely high levels (3.5). Is there any approved
way to encourage such scans that's a but less violent than this?

Is the pg_class.relallvisible estimate for the table realistic?  They
might need a few more VACUUM and ANALYZE cycles to get it into the
neighborhood of reality, if not.


That was the problem - I didn't know this hadn't been done.



Keep in mind also that small values of random_page_cost necessarily
decrease the apparent advantage of index-only scans.  If you think 3.5
is an insanely high setting, I wonder whether you haven't driven those
numbers too far in the other direction to compensate for something else.


Right.

Thanks for the help.

cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] encouraging index-only scans

2012-12-12 Thread Andrew Dunstan


On 12/12/2012 05:12 PM, Andrew Dunstan wrote:


On 12/12/2012 04:32 PM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

A client is testing a migration from 9.1 to 9.2, and has found that a
large number of queries run much faster if they use index-only scans.
However, the only way he has found to get such a plan is by increasing
the seq_page_cost to insanely high levels (3.5). Is there any approved
way to encourage such scans that's a but less violent than this?

Is the pg_class.relallvisible estimate for the table realistic? They
might need a few more VACUUM and ANALYZE cycles to get it into the
neighborhood of reality, if not.


That was the problem - I didn't know this hadn't been done.



Actually, the table had been analysed but not vacuumed, so this kinda 
begs the question what will happen to this value on pg_upgrade? Will 
people's queries suddenly get slower until autovacuum kicks in on the table?


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: xfs perform a lot better than ext4 [WAS: Re: [PERFORM] Two identical systems, radically different performance]

2012-12-05 Thread Andrew Dunstan


On 12/05/2012 11:51 AM, Jean-David Beyer wrote:



I thought that postgreSQL did its own journalling, if that is the 
proper term, so why not use an ext2 file system to lower overhead?



Postgres journalling will not save you from a corrupt file system.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Postgres configuration for 8 CPUs, 6 GB RAM

2012-11-27 Thread Andrew Dunstan


On 11/27/2012 02:47 AM, Syed Asif Tanveer wrote:


Hi,

I am using PostgreSQL 9.1.5 for Data warehousing and OLAP puposes. 
Data size is around 100 GB and I have tuned my PostgreSQL accordingly 
still I am facing performance issues. The query performance is too low 
despite tables being properly indexed and are vacuumed and analyzed at 
regular basis. CPU usage never exceeded 15% even at peak usage times. 
Kindly guide me through if there are any mistakes in setting 
configuration parameters. Below are my system specs and please find 
attached my postgresql configuration parameters for current system.






There is at least anecdotal evidence that Windows servers degrade when 
shared_buffers is set above 512Mb. Personally, I would not recommend 
using Windows for a high performance server.


Also, it makes no sense to have a lower setting for maintenance_work_mem 
than for work_mem. You would normally expect maintenance_work_mem to be 
higher - sometimes much higher.


Apart from that, it's going to be impossible to tell what your problem 
is without seeing actual slow running queries and their corresponding 
explain analyse output.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 08:04 AM, Heikki Linnakangas wrote:

On 21.11.2012 01:53, Tom Lane wrote:

I think the more interesting question is what cases wouldn't be covered
by such a rule.  Typically you need to use OFFSET 0 in situations where
the planner has guessed wrong about costs or rowcounts, and I think
people are likely using WITH for that as well.  Should we be telling
people that they ought to insert OFFSET 0 in WITH queries if they want
to be sure there's an optimization fence?


Yes, I strongly feel that we should. Writing a query using WITH often 
makes it more readable. It would be a shame if people have to refrain 
from using it, because the planner treats it as an optimization fence.







If we're going to do it can we please come up with something more 
intuitive and much, much more documented than OFFSET 0? And if/when we 
do this we'll need to have big red warnings all over then release notes, 
since a lot of people I know will need to do some extensive remediation 
before moving to such a release.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 09:59 AM, Tom Lane wrote:

Andrew Dunstan and...@dunslane.net writes:

If we're going to do it can we please come up with something more
intuitive and much, much more documented than OFFSET 0? And if/when we
do this we'll need to have big red warnings all over then release notes,
since a lot of people I know will need to do some extensive remediation
before moving to such a release.

The probability that we would actually *remove* that behavior of OFFSET
0 is not distinguishable from zero.  I'm not terribly excited about
having an alternate syntax to specify an optimization fence, but even
if we do create such a thing, there's no need to break the historical
usage.



I wasn't talking about removing it. My point was that if the 
optimization fence around CTEs is removed a lot of people will need to 
rework apps where they have used them for that purpose. And I continue 
to think that spelling it OFFSET 0 is horribly obscure.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 11:32 AM, Claudio Freire wrote:

On Wed, Nov 21, 2012 at 1:24 PM, Andres Freund and...@2ndquadrant.com wrote:

On 2012-11-21 13:16:25 -0300, Claudio Freire wrote:

On Wed, Nov 21, 2012 at 12:32 PM, Andres Freund and...@anarazel.de wrote:

+1

WITH foo AS (SELECT ...) (barrier=on|off)?

9.3 introduces the syntax, defaulting to on
9.4 switches the default to off.

Why syntax? What about a guc?

collapse_cte_limit?

Because there are very good reasons to want to current behaviour. A guc
is a global either/or so I don't see it helping much.

set collapse_cte_limit=8;
with blah as (blah) select blah;

Not global at all.



Then you have to unset it again, which is ugly. You might even want it 
applying to *part* of a query, not the whole thing, so this strikes me 
as a dead end.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-21 Thread Andrew Dunstan


On 11/21/2012 02:30 PM, Gavin Flower wrote:

WITH FENCE foo AS (SELECT ...)
default?


WITHOUT FENCE foo AS (SELECT ...) :-)
Nah!
I prefer this, but it is too specific to 'WITH',
and very unSQL standardish!

Alternatively one of the following

 1. WITH UNFENCED foo AS (SELECT ...)
 2. WITH NO FENCE foo AS (SELECT ...)
 3. WITH NOT FENCE foo AS (SELECT ...)

I loke the firsat variant, but the 3rd is
most SQL standardish!



As Tom (I think) pointed out, we should not have a syntax tied to CTEs.

cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] partitioning versus clustering

2012-11-19 Thread Andrew W. Gibbs
Postgres Performance Wizards,

I am a few years into developing and operating a system underpinned by
Postgres that sees the arrival a significant number of events around
the clock, not an epic amount of data, but enough to be challenging to
me, in particular when system downtime is not very palatable and the
data is retained quasi-indefinitely.

I have various tables that house different kinds of events, and in
addition to wanting to look at a small number of rows of data, users
often want to generate summary reports on large swaths of data that
span days or weeks.  At present, these reports can trigger index scans
that take minutes to service, and the parameters of the reports are
user specified, making their pre-generation infeasible.  Generally the
rows in these tables are write-once, but they contain a pointer to the
related BLOB from which they were constructed, and every now and again
some new field in the originating BLOB becomes of interest, causing me
to alter the table and then do a sweep of the full table with
corresponding updates, violating the otherwise INSERT-only nature.

These event tables generally have an event time column that is
indexed and which is an obvious candidate for either partitioning or
clustering of the table.  I'm trying to make sense of which is the
better option for me.

As best I can tell, the decision points are as follows...

PARTITIONING

Pros:

  * no outage; data just starts flowing into new partitions seamlessly
  * allows more control over where the data goes, creating retrieval 
parallelization opportunities
  * clustering cannot be inadvertently undone in a way that requires 
scheduled downtime to repair
  * probably more resilient in the case of the event time being different 
from the time that I processed the event

Cons:

  * does not deal with legacy data without extra migration (over time this 
becomes less relevant)
  * requires some kind of background process to manage partition creation
  * partition size will affect performance and choosing its size is not a 
science

CLUSTERING

Pros:

  * no particularly custom development work on my part
  * once done, it puts all existing data in a good state for efficient querying 
without extra work

Cons:

  * will lock up the system for the duration of the CLUSTER command
  * somehow need to make sure that ANALYZE commands run often enough
  * does not give me much control of the underlying storage layout
  * may have problems when the occasional mass-UPDATE is done
  * unclear whether a VACUUM FULL is required to prevent subsequent 
un-clustered-ness despite having a fill factor of 100, stemming from the 
mass-UPDATE operations
  * could generate a huge number of WAL segments to archive
  * could possibly be sabotaged by the event time property not being well 
correlated with the time that the event is processed in the face of upstream 
systems have momentary issues

As far as questions to the group go:

  * Is my understanding of the pros and cons of the options reasonably correct 
and comprehensive?
  * What has governed your decisions in making such a choice on past projects 
of your own?
  * If I go the clustering route, will the occasional mass update really mess 
with things, requiring a re-cluster and possibly even a full vacuum (to prevent 
re-un-clustering)?
  * Might it make more sense to cluster when the event time property is the 
time that I processed the event but partition when it is the time that the 
event occurred in some other system?
  * Is running a CLUSTER command actually necessary to get the performance 
benefits if the table ought already be in a good order, or is just running a 
CLUSTER command on a well ordered table enough to get query execution to yield 
nice sequential access to the disk?

Many thanks in advance for your insights...

  -- AWG


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-14 Thread Andrew Dunstan


On 11/14/2012 10:23 AM, David Greco wrote:


Have a query using a CTE that is performing very poorly. The 
equivalent query against the same data in an Oracle database runs in 
under 1 second, in Postgres  it takes 2000 seconds.


The smp_pkg.get_invoice_charges queries fedexinvoices for some data 
and normalizes it into a SETOF some record type. It is declared to be 
STABLE. Fedexinvoices consists of about 1.3M rows of medium width. 
Fedexinvoices.id is the primary key on that table, and 
trim(fedexinvoices.trackno) is indexed via the function trim.


The plan for the equivalent query in Oracle is much smaller and 
simpler. No sequential (or full table) scans on fedexinvoices.


WITH charges as (

SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) 
charge_info from fedexinvoices fi2


)

select fedexinvoices.* from

fedexinvoices

inner join charges on charges.id = fedexinvoices.id AND 
(charges.charge_info).charge_name IN ('ADDRESS CORRECTION 
CHARGE','ADDRESS CORRECTION')


where

trim(fedexinvoices.trackno)='799159791643'

;




Can you explain what you're actually trying to do here? The query looks 
rather odd. Why are you joining this table (or an extract from it) to 
itself?



In any case, you could almost certainly recast it and have it run fast 
by first filtering on the tracking number.



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Poor performance using CTE

2012-11-14 Thread Andrew Dunstan


On 11/14/2012 10:56 AM, David Greco wrote:

You're right. I was translating an oracle query , but looks like PG will allow 
some syntax that is different. Trying to find entries in fedexinvoices where 
smp_pkg.get_invoice_charges(id) returns a record containing charge_name in 
('ADDRESS CORRECTION CHARGE','ADDRESS CORRECTION'). Should return the 
fedexinvoices row and the row from smp_pkg.get_invoice_charges that contains 
the address correction.


Something like this, though this is syntactically incorrect as 
smp_pkg.get_invoice_charges returns a set:


select fedexinvoices.*, (smp_pkg.get_invoice_charges(id)).*
from fedexinvoices
WHERE
trim(fedexinvoices.trackno)='799159791643'
and
(smp_pkg.get_invoice_charges(id)).charge_name IN ('ADDRESS CORRECTION 
CHARGE','ADDRESS CORRECTION')



First, please don't top-post when someone has replied underneath your 
post. It makes the thread totally unreadable. See 
http://idallen.com/topposting.html


You could do something like this:

WITH invoices as
(
   select *
   from fedexinvoices
   where trim(fedexinvoices.trackno)='799159791643'
),

charges as
(
   SELECT fi2.id, smp_pkg.get_invoice_charges(fi2.id) charge_info
   from fedexinvoices fi2 join invoices i on i.id = f12.id
)

select invoices.*
from invoices
inner join charges on charges.id = invoices.id
AND (charges.charge_info).charge_name IN ('ADDRESS CORRECTION 
CHARGE','ADDRESS CORRECTION')


;


Or probably something way simpler but I just did this fairly quickly and 
mechanically



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: SOLVED - RE: [PERFORM] Poor performance using CTE

2012-11-14 Thread Andrew Dunstan


On 11/14/2012 08:17 PM, Craig Ringer wrote:

On 11/15/2012 12:29 AM, Tom Lane wrote:

David Greco david_gr...@harte-hanks.com writes:

Thanks, that did the trick. Though I'm still not clear as to why.

PG treats WITH as an optimization fence --- the WITH query will be
executed pretty much as-is.  It may be that Oracle flattens the query
somehow; though if you're using black-box functions in both cases,
it's not obvious where the optimizer could get any purchase that way.


I was looking through the latest spec drafts I have access to and
couldn't find any reference to Pg's optimisation-fence-for-CTEs
behaviour being required by the standard, though I've repeatedly seen it
said that there is such a requirement.

Do you know where it's specified?

All I can see is that the optimised result must have the same effect as
the original. That'd mean that wCTEs and CTE terms that use VOLATILE
functions or functions with side-effects couldn't be optimised into
other queries. Simple CTEs could be, though, and there are times I've
really wished I could use a CTE but I've had to use a set-returning
subquery to get reasonable plans.



It cuts both ways. I have used CTEs a LOT precisely because this 
behaviour lets me get better plans. Without that I'll be back to using 
the offset 0 hack.


cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] pg_dump and thousands of schemas

2012-11-13 Thread Andrew Dunstan


On 11/13/2012 10:12 PM, Denis wrote:

Please don't think that I'm trying to nitpick here, but pg_dump has options
for dumping separate tables and that's not really consistent with the idea
that pg_dump is primarily  designed for dumping entire databases.





Sure it is. The word primarily is not just a noise word here.

The fact that we have options to do other things doesn't mean that its 
primary design goal has changed.



cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Remote access to Postgresql slow

2012-09-15 Thread Andrew Barnham
Is your network link between server and client across the public internet?

You need to check bandwidth and latency characteristics of your network.

A simple test run following on server host and run it again on the client
host.

time psql [connect details] -c 'select now()'

I access postgresql database across the public internet (by tunnelling port
5432 across compressed ssh sessions). In my case latency is a significant
penalty.  Locally time response is for above is 10ms but remotely it is 30
times slower (350ms)

You may need to install wireshark or similar and monitor client traffic in
order to figure out the network penalty.  Maybe your app goes back and
forward to postgres multiple time;  does lots of chatter. If so then
latency cost becomes very significant. You want to try and minimise the
number of postgresql calls; retrieve more data will less SQL operations.



On Fri, Sep 14, 2012 at 7:02 PM, Manoj Agarwal m...@ockham.be wrote:

 **

 Hi,

 I have a Centos 6.2 Virtual machine that contains Postgresql version 8.4
 database. The application installed in this Virtual machine uses this
 database that is local to this Virtual machine.  I have tried to offload
 the database, by installing it on a remote Virtual machine, on another
 server, and tried to connect to it from my local Virtual machine. The
 application configuration remains the same, only database is offloaded to a
 remote Virtual machine on another server and the connection parameters have
 changed.  The connection is all fine and the application can access the
 remote database.

 I have observed that the Postgresql is responding extremely slow.  What
 should I do to improve its performance?

 Please suggest.


 Kind Regards,
 Manoj Agarwal




Re: [PERFORM] add column with default value is very slow

2012-09-11 Thread Andrew Dunstan


On 09/11/2012 09:55 AM, AI Rumman wrote:

I added the excel file for locks data.
I was surprised to see that while I was updating a single column value 
for all records in a tables, all indexes are locked by the server.



Any ALTER TABLE command locks the whole table in ACCESS EXCLUSIVE mode, 
indexes included. See the description of ACCESS EXCLUSIVE lock at 
http://www.postgresql.org/docs/current/static/explicit-locking.html


cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] best practice to avoid table bloat?

2012-08-16 Thread Andrew Dunstan


On 08/16/2012 04:33 PM, Anibal David Acosta wrote:


Hi,

if I have a table that daily at night is deleted about 8 millions of 
rows (table maybe has 9 millions) is recommended to do a vacuum 
analyze after delete completes or can I leave this job to autovacuum?


This table is very active during the day but less active during night

I think that the only only thing where Postgres is weak, is in this 
area (table and index bloat).


For some reason for the same amount of data every day postgres consume 
a little more.






Check out pg_reorg.

cheers

andrew


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Odd blocking (or massively latent) issue - even with EXPLAIN

2012-07-23 Thread Andrew Dunstan


On 07/23/2012 04:41 AM, Jim Vanns wrote:

We're seeing SELECT statements and even EXPLAIN (no ANAYLZE)
statements hang indefinitely until *something* (we don't know what)
releases some kind of resource or no longer becomes a massive bottle
neck. These are the symptoms.



I have seen this sort of behaviour on systems with massive catalogs 
(millions of tables and indexes). Could that be your problem?



cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] queries are fast after dump-restore but slow again after some days dispite vacuum

2012-07-19 Thread Andrew Dunstan


On 07/19/2012 07:33 AM, Felix Scheicher wrote:

Hi, I am running a 9.1 server at Ubuntu. When I upgraded to the current version
I did a pg_dump followed by pg_restore and found that the db was much faster.
But slowed down again after two days. I did the dump-restore again and could now
compare the two (actually identical) databases. This is a query of the old one
directly after a VACUUM ANALYSE:

...


No matter how much I vacuum or analyse the slow db, I don't get it faster.
I also checked for dead tuples - there are none.



Try running CLUSTER on the relevant tables and see if it makes a 
difference. If it does you might want to look into using pg_reorg 
periodically.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] queries are fast after dump-restore but slow again after some days dispite vacuum

2012-07-19 Thread Andrew Dunstan


On 07/19/2012 11:13 AM, Felix Scheicher wrote:

Andrew Dunstan andrew at dunslane.net writes:


Try running CLUSTER on the relevant tables and see if it makes a
difference. If it does you might want to look into using pg_reorg
periodically.


That worked like a charm! Many thanks. But how comes, the queries are also fast
after a restore without the cluster?





There is probably a lot of unused space in your table. CLUSTER rewrites 
a fresh copy, as do restore and pg_reorg.


You might also want to try changing the settings on the table so it gets 
much more aggressively auto-vacuumed, so that dead space is made 
available much more quickly, and the table has less chance to get bloated.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] DELETE vs TRUNCATE explanation

2012-07-11 Thread Andrew Dunstan


On 07/11/2012 04:47 PM, Shaun Thomas wrote:

On 07/11/2012 03:18 PM, Craig James wrote:


It strikes me as a contrived case rather than a use case.  What sort of
app repeatedly fills and truncates a small table thousands of times ...
other than a test app to see whether you can do it or not?


Test systems. Any company with even a medium-size QA environment will 
have continuous integration systems that run unit tests on a trash 
database hundreds or thousands of times through the day. Aside from 
dropping/creating the database via template, which would be *really* 
slow, truncate is the easiest/fastest way to reset between tests.



Why is recreating the test db from a (populated) template going to be 
slower than truncating all the tables and repopulating from an external 
source? I had a client who achieved a major improvement in speed and 
reduction in load by moving to this method of test db setup.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Paged Query

2012-07-09 Thread Andrew Dunstan


On 07/09/2012 01:41 PM, Misa Simic wrote:



From my experience users even very rare go to ending pages... easier 
to them would be to sort data by field to get those rows in very first 
pages...






Yeah, the problem really is that most client code wants to know how many 
pages there are, even if it only wants one page right now.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Sequencial scan in a JOIN

2012-06-05 Thread Andrew Jaimes

Hi everyone, 

I am trying to run the following query:

SELECT  count(1) --DISTINCT l_userqueue.queueid
  FROM e_usersessions
  JOIN l_userqueue
ON l_userqueue.userid = e_usersessions.entityid
  JOIN a_activity
ON a_activity.activequeueid  = l_userqueue.queueid
   AND a_activity.vstatus= 1
   AND a_activity.ventrydate  0
   AND a_activity.sbuid  = e_usersessions.sbuid 
   AND a_activity.assignedtoid   = 0
   AND a_activity.status '0'
  WHERE e_usersessions.sessionkeepalivedatetime  20120605082131943

Explain analyze:
'Aggregate  (cost=100402.10..100402.11 rows=1 width=0) (actual 
time=2249.051..2249.051 rows=1 loops=1)'
'  -  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual 
time=0.541..2249.027 rows=33 loops=1)'
'Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid) AND 
(a_activity.sbuid = e_usersessions.sbuid))'
'-  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167 
width=22) (actual time=0.010..1662.142 rows=1207855 loops=1)'
'  Filter: ((ventrydate  0) AND ((status)::text  '0'::text) AND 
(vstatus = 1) AND (assignedtoid = 0::numeric))'
'-  Hash  (cost=10.86..10.86 rows=5 width=22) (actual 
time=0.053..0.053 rows=4 loops=1)'
'  -  Hash Join  (cost=9.38..10.86 rows=5 width=22) (actual 
time=0.033..0.048 rows=4 loops=1)'
'Hash Cond: (l_userqueue.userid = e_usersessions.entityid)'
'-  Seq Scan on l_userqueue  (cost=0.00..1.23 rows=23 
width=27) (actual time=0.003..0.009 rows=23 loops=1)'
'-  Hash  (cost=9.31..9.31 rows=5 width=21) (actual 
time=0.018..0.018 rows=2 loops=1)'
'  -  Index Scan using i06_e_usersessions on 
e_usersessions  (cost=0.00..9.31 rows=5 width=21) (actual time=0.009..0.012 
rows=2 loops=1)'
'Index Cond: (sessionkeepalivedatetime  
20120605082131943::bigint)'
'Total runtime: 2249.146 ms'

I am trying to understand the reason why the a sequencial scan is used on 
a_activity instead of using the index by activequeueid (i08_a_activity). If I 
run the this other query, I get a complete different results:

SELECT * 
 FROM a_activity
 WHERE a_activity.activequeueid  = 123456
   AND a_activity.vstatus= 1
   AND a_activity.ventrydate  0

Explain analyze:
'Index Scan using i08_a_activity on a_activity  (cost=0.00..303.57 rows=162 
width=7287) (actual time=0.019..0.019 rows=0 loops=1)'
'  Index Cond: ((activequeueid = 123456::numeric) AND (vstatus = 1) AND 
(ventrydate  0))'
'Total runtime: 0.076 ms'


This is the definition of the index :

CREATE INDEX i08_a_activity
  ON a_activity
  USING btree
  (activequeueid , vstatus , ventrydate );


 a_activity table has 1,216,134 rows



Thanks in advance,
Andrew


  

Re: [PERFORM] Sequencial scan in a JOIN

2012-06-05 Thread Andrew Jaimes

the default_statistics_target is set to 200, and I have run the analyze and 
reindex on these tables before writing the email. 





Andrew

 Date: Tue, 5 Jun 2012 08:15:45 -0500
 From: stho...@optionshouse.com
 To: andrewjai...@hotmail.com
 CC: pgsql-performance@postgresql.org
 Subject: Re: [PERFORM] Sequencial scan in a JOIN
 
 On 06/05/2012 07:48 AM, Andrew Jaimes wrote:
 
  '  -  Hash Join  (cost=10.93..99795.09 rows=242803 width=0) (actual
  time=0.541..2249.027 rows=33 loops=1)'
  'Hash Cond: ((a_activity.activequeueid = l_userqueue.queueid)
  AND (a_activity.sbuid = e_usersessions.sbuid))'
  '-  Seq Scan on a_activity  (cost=0.00..88462.52 rows=1208167
  width=22) (actual time=0.010..1662.142
 
 I'd be willing to bet your stats are way, way off. It expected 242,803 
 rows in the hash, but only got 33. In that kind of scenario, I could 
 easily see the planner choosing a sequence scan over an index scan, as 
 doing that many index seeks would be much more expensive than scanning 
 the table.
 
 What's your default_statistics_target, and when is the last time you 
 analyzed the tables in this query?
 
 -- 
 Shaun Thomas
 OptionsHouse | 141 W. Jackson Blvd. | Suite 500 | Chicago IL, 60604
 312-444-8534
 stho...@optionshouse.com
 
 __
 
 See http://www.peak6.com/email_disclaimer/ for terms and conditions related 
 to this email
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance
  

Re: [PERFORM] query optimization

2012-04-26 Thread Andrew Dunstan



On 04/26/2012 04:08 PM, Tom Lane wrote:

Thomas Kellererspam_ea...@gmx.net  writes:

Tom Lane wrote on 26.04.2012 21:17:

Um ... did you analyze all the tables, or just some of them?  I get
sub-millisecond runtimes if all four tables have been analyzed, but it
does seem to pick lousy plans if, say, only a and b have been analyzed.

Here it's similar to Richard's experience:
Before analyzing the four tables, the first statement yields this plan:
[ merge joins ]
This continues to stay the plan for about 10-15 repetitions, then it turns to 
this plan
[ hash joins ]

Hmm.  I see it liking the merge-join plan (with minor variations) with
or without analyze data, but if just some of the tables have been
analyzed, it goes for the hash plan which is a good deal slower.  The
cost estimates aren't that far apart though.  In any case, the only
reason the merge join is so fast is that the data is perfectly ordered
in each table; on a less contrived example, it could well be a lot
slower.



It's not so terribly contrived, is it? It's common enough to have tables 
which are append-only and to join them by something that corresponds to 
the append order (serial field, timestamp etc.)


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] scale up (postgresql vs mssql)

2012-04-13 Thread Andrew Dunstan



On 04/11/2012 06:11 PM, Eyal Wilde wrote:

hi,

i had a stored procedure in ms-sql server. this stored procedure gets 
a parameter (account-id), dose about 20 queries, fills some temporary 
tables, and finally, returns a few result-sets. this stored procedure 
converted to stored function in postgresql (9.1). the result-sets are 
being returned using refcursors. this stored function is logically, 
almost identical to the ms-sql stored procedure.  a LOT of work had 
been done to make postgresql getting close to ms-sql speed (preparing 
temp-tables in advance, using analyze in special places inside the 
stored function in order to hint the optimizer that the temp-tables 
have very few records, thus eliminating unnecessary and expansive 
hash-join, and a lot more..). after all that, the stored function is 
running in a reasonable speed (normally ~60 milliseconds).


now, i run a test that simulates 20 simultaneous clients, asking for 
account-id randomly. once a client get a result, it immediately asks 
for another one. the test last 5 seconds.  i use a connection pool 
(with Tomcat web-server). the pool is automatically increased to ~20 
connections (as expected). the result is postgresql dose ~60 
account-ids, whereas ms-sql dose ~330 account-ids. postgresql 
shows that each account-id took about 400-1000 msec ,which is so 
much slower than the ~60 msec of a single execution.


in a single execution postgresql may be less the twice slower than 
ms-sql, but in 20 simultaneous clients, it's about 6 times worse. why 
is that?


the hardware is one 4-core xeon. 8GB of ram. the database size is just 
a few GB's. centos-6.2.


do you think the fact that postgresql use a process per connection 
(instead of multi-threading) is inherently a weakness of postgrsql, 
regarding scale-up?
would it be better to limit the number of connections to something 
like 4, so that executions don't interrupt each other?


thanks in advance for any help!



I doubt that the process-per-connection has much effect, especially on 
Linux where process creation is extremely cheap, and you're using a 
connection pooler anyway. The server is pretty modest, though. If you 
can add enough RAM that you can fit the whole db into Postgres shared 
buffers you might find things run a whole lot better. You should show us 
your memory settings, among other things - especially shared_buffers, 
temp_buffers and work_mem.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance of SQL Function versus View

2012-04-03 Thread Andrew Dunstan



On 04/03/2012 10:21 AM, Robert Haas wrote:


You should probably test this in your environment, but I'd expect the
view to be better.  Wrapping logic inside PL/pgsql functions
needlessly rarely turn outs to be a win.




Right, But also note that auto_explain is very useful in getting plans 
and times of queries nested in functions which can't easily be got 
otherwise.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Update join performance issues

2012-04-03 Thread Andrew Dunstan



On 04/03/2012 01:29 PM, Kevin Kempter wrote:

Hi All;

I have a query that wants to update a table based on a join like this:

update test_one
set f_key = t.f_key
from
upd_temp1 t,
test_one t2
where
t.id_number = t2.id_number



Why is test_one in the from clause? update joins whatever is in the from 
clause to the table being updated. You almost never need it repeated in 
the from clause.



cheers

andrew




--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TCP Overhead on Local Loopback

2012-04-02 Thread Andrew Dunstan



On 04/01/2012 09:11 PM, Andrew Dunstan wrote:



On 04/01/2012 08:29 PM, Claudio Freire wrote:
On Sun, Apr 1, 2012 at 8:54 PM, Andrew Dunstanand...@dunslane.net  
wrote:

You could try using Unix domain socket and see if the performance
improves. A relevant link:


He said Windows. There are no Unix domain sockets on Windows. (And 
please

don't top-post)

Windows supports named pipes, which are functionally similar, but I
don't think pg supports them.



Correct, so telling the OP to have a look at them isn't at all 
helpful. And they are not supported on all Windows platforms we 
support either (specifically not on XP, AIUI).





Apparently I was mistaken about the availability. However, my initial 
point remains. Since all our  client/server comms on Windows are over 
TCP, telling the OP to look at Unix domain sockets is unhelpful.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TCP Overhead on Local Loopback

2012-04-01 Thread Andrew Dunstan



On 04/01/2012 06:01 PM, Andy wrote:
You could try using Unix domain socket and see if the performance 
improves. A relevant link:


He said Windows. There are no Unix domain sockets on Windows. (And 
please don't top-post)


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] TCP Overhead on Local Loopback

2012-04-01 Thread Andrew Dunstan



On 04/01/2012 08:29 PM, Claudio Freire wrote:

On Sun, Apr 1, 2012 at 8:54 PM, Andrew Dunstanand...@dunslane.net  wrote:

You could try using Unix domain socket and see if the performance
improves. A relevant link:


He said Windows. There are no Unix domain sockets on Windows. (And please
don't top-post)

Windows supports named pipes, which are functionally similar, but I
don't think pg supports them.



Correct, so telling the OP to have a look at them isn't at all helpful. 
And they are not supported on all Windows platforms we support either 
(specifically not on XP, AIUI).


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Tablespaces on a raid configuration

2012-03-30 Thread Andrew Dunstan



On 03/30/2012 10:45 AM, Campbell, Lance wrote:


PostgreSQL 9.0.x

When PostgreSQL  storage is using a relatively large raid  5 or 6 
array is there any value in having your tables distributed across 
multiple tablespaces if those tablespaces will exists on the same raid 
array?  I understand the value if you were to have the tablespaces on 
different raid arrays.  But what about on the same one?





Not answering your question, but standard advice is not to use RAID 5 or 
6, but RAID 10 for databases. Not sure if that still hold if you're 
using SSDs.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] database slowdown while a lot of inserts occur

2012-03-29 Thread Andrew Dunstan



On 03/29/2012 03:27 PM, Bob Lunney wrote:

Lance,

May small inserts cause frequent fsyncs.  Is there any way those small 
inserts can be batched into some larger sets of inserts that use copy 
to perform the load?




Or possibly a prepared statement called many times in a single 
transaction, if you're not using that already. It's not as efficient as 
COPY, but it's often a much less extensive change to the code.


cheers

andrew



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow self-join query

2012-03-18 Thread Andrew Dunstan



On 03/18/2012 10:37 AM, Robert Poor wrote:


On Sat, Mar 17, 2012 at 23:09, Scott Marlowe scott.marl...@gmail.com 
mailto:scott.marl...@gmail.com wrote:


Also it looks like you're still not using the index on this:

Subquery Scan u1 (cost=0.00..313.55 rows=50 width=4) (actual
time=0.030..147.136 rows=1 loops=1)

   Filter: ((u1.type)::text = 'User::Twitter'::text)

Are you sure you're using an indexable condition?


I know that users.type is indexed -- what would keep that from being 
honored?  FWIW, I believe that all user.type fields are set to 
User::Twitter, but that will change in the future.





If all the rows have that value, then using the index would be silly. 
Postgres knows from the stats that ANALYZE calculates whether or not 
using an index is likely to be more efficient, and avoids doing so in 
cases where it isn't.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan



On 03/01/2012 05:52 PM, Stefan Keller wrote:

These are the current modified settings in postgresql.conf:
shared_buffers = 128MB
work_mem = 3MB


These are extremely low settings on virtually any modern computer. I 
usually look to set shared buffers in numbers of Gb and work_mem at 
least in tens if not hundreds of Mb for any significantly sized database.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] PG as in-memory db? How to warm up and re-populate buffers? How to read in all tuples into memory?

2012-03-01 Thread Andrew Dunstan



On 03/01/2012 07:58 PM, Claudio Freire wrote:

On Thu, Mar 1, 2012 at 9:28 PM, Peter van Hardenbergp...@pvh.ca  wrote:

Setting work_mem to hundreds of MB in a 4G system is suicide. Tens
even is dangerous.


Why do you say that? We've had work_mem happily at 100MB for years. Is
there a particular degenerate case you're concerned about?

Me too.

But I've analyzed the queries I'll be sending to the database and I've
carefully bound the effective amount of memory used given the load
I'll be experiencing.

Saying that it should be set to 100M without consideration for those
matters is the suicide part. work_mem applies to each sort operation.
Suppose, just for the sake of argument, that each connection is
performing 5 such sorts (ie, 5 joins of big tables - not unthinkable),
then suppose you have your max_connections to the default of 100, then
the system could request as much as 50G of ram.

I set work_mem higher in my database system since I *know* most of the
connections will not perform any merge or hash joins, nor will they
sort the output, so they won't use work_mem even once. The ones that
will, I have limited on the application side to a handful, hence I
*know* that 50G theoretical maximum will not be reached.

Can the OP say that? I have no reason to think so. Hence I don't
suggest 100M is OK on a 4G system.


Well, obviously you need to know your workload. Nobody said otherwise.

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-27 Thread Andrew Dunstan



On 02/27/2012 09:45 AM, Shaun Thomas wrote:

On 02/27/2012 02:08 AM, Reuven M. Lerner wrote:


In the end, it was agreed that we could execute the deletes over
time, deleting items in the background, or in parallel with the
application's work. After all, if the disk is filling up at the rate
of 2 GB/day, then so long as we delete 4 GB/day (which is pretty easy
to do), we should be fine.


Please tell me you understand deleting rows from a PostgreSQL database 
doesn't work like this. :) The MVCC storage system means you'll 
basically just be marking all those deleted rows as reusable, so your 
database will stop growing, but you'll eventually want to purge all 
the accumulated dead rows.


One way to see how many there are is to use the pgstattuple contrib 
module. You can just call it on the table name in question:


SELECT * FROM pgstattuple('my_table');

You may find that after your deletes are done, you'll have a free_pct 
of 80+%. In order to get rid of all that, you'll need to either run 
CLUSTER on your table(s) or use the select-truncate-insert method 
anyway.




If he has autovacuum on he could well be just fine with his proposed 
strategy. Or he could have tables partitioned by time and do the delete 
by just dropping partitions. There are numerous way he could get this to 
work.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Very long deletion time on a 200 GB database

2012-02-23 Thread Andrew Dunstan



On 02/23/2012 05:07 AM, Marcin Mańk wrote:

DELETE FROM  B
WHERE r_id IN (SELECT R.id
 FROM R, B
WHERE r.end_date  (NOW() - (interval '1 day' * 30))
  AND r.id = b.r_id


How about:

  DELETE FROM  B
  WHERE r_id IN (SELECT distinct R.id
  FROM R  WHERE r.end_date  (NOW() - (interval '1 day' * 30))

?



Or possibly without the DISTINCT.  But I agree that the original query 
shouldn't have B in the subquery - that alone could well make it crawl.


What is the distribution of end_dates? It might be worth running this in 
several steps, deleting records older than, say, 90 days, 60 days, 30 days.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Query slow as Function

2012-02-18 Thread Andrew Dunstan



On 02/18/2012 11:37 AM, Tom Lane wrote:

Andreas Kretschmerakretsch...@spamfence.net  writes:

You can check the plan with the auto_explain - Extension, and you can
force the planner to create a plan based on the actual input-value by
using dynamic SQL (EXECUTE 'your query string' inside the function)

Steve *is* using EXECUTE, so that doesn't seem to be the answer.  I'm
wondering about datatype mismatches myself --- the function form is
forcing the parameter to be char(9), which is not a constraint imposed
in the written-out query.  There are lots of other possibilities
though.  It would be hard to say much without a self-contained example
to try.




He's using EXECUTE ... USING. Does that plan with the used parameter?

cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Exploring memory usage

2011-12-27 Thread Andrew Dunstan



On 12/27/2011 11:00 AM, Scott Marlowe wrote:

On Tue, Dec 27, 2011 at 8:33 AM, Claudio Freireklaussfre...@gmail.com  wrote:

On Sat, Dec 24, 2011 at 4:22 PM, Michael Smolskysitr...@email.com  wrote:

work_mem = 128MB (tried 257MB, didn't change anything)

This is probably your problem.

Without an EXPLAIN output, I cannot be sure, but 'work_mem' is not the
total amount of memory a query can use, it's the amount of memory it
can use for *one* sort/hash/whatever operation. A complex query can
have many of those, so your machine is probably swapping due to
excessive memory requirements.

Try *lowering* it. You can do so only for that query, by executing:

set work_mem = '8MB';your query

He can lower it for just that query but honestly, even on a machine
with much more memory I'd never set it as high as he has it.  On a
busy machine with 128G RAM the max I ever had it set to was 16M, and
that was high enough I kept a close eye on it (well, nagios did
anway.)




It depends on the workload. Your 16M setting would make many of my 
clients' systems slow to an absolute crawl for some queries, and they 
don't run into swap issues, because we've made educated guesses about 
usage patterns.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Performance costs of various PL languages

2011-12-27 Thread Andrew Dunstan



On 12/27/2011 05:54 PM, Merlin Moncure wrote:

On Tue, Dec 27, 2011 at 4:20 PM, Pavel Stehulepavel.steh...@gmail.com  wrote:

Hello

2011/12/27 Carlo Stonebanksstonec.regis...@sympatico.ca:

We are currently using pltclu as our PL of choice AFTER plpgSql.

I'd like to know if anyone can comment on the performance costs of the
various PL languages BESIDES C. For example, does pltclu instantiate faster
than pltcl (presumably because it uses a shared interpreter?) Is Perl more
lightweight?

I know that everything depends on context - what you are doing with it, e.g.
choose Tcl for string handling vs. Perl for number crunching - but for those
who know about this, is there a clear performance advantage for any of the
various PL languages - and if so, is it a difference so big to be worth
switching?

I ask this because I had expected to see pl/pgsql as a clear winner in terms
of performance over pltclu, but my initial test showed the opposite. I know
this may be an apples vs oranges problem and I will test further, but if
anyone has any advice or insight, I would appreciate it so I can tailor my
tests accordingly.


A performance strongly depends on use case.

PL/pgSQL has fast start but any expression is evaluated as simple SQL
expression - and some repeated operation should be very expensive -
array update, string update. PL/pgSQL is best as SQL glue. Positive to
performance is type compatibility between plpgsql and Postgres.
Interpret plpgsql is very simply - there are +/- zero optimizations -
plpgsql code should be minimalistic, but when you don't do some really
wrong, then a speed is comparable with PHP.

http://www.pgsql.cz/index.php/PL/pgSQL_%28en%29#Inappropriate_use_of_the_PL.2FpgSQL_language

PL/Perl has slower start - but string or array operations are very
fast. Perl has own expression evaluator - faster than expression
evaluation in plpgsql. On second hand - any input must be transformed
from postgres format to perl format and any result must be transformed
too. Perl and other languages doesn't use data type compatible with
Postgres.

One big advantage pl/pgsql has over scripting languages is that it
understands postgresql types natively.  It knows what a postgres array
is, and can manipulate one directly.  pl/perl would typically have to
have the database convert it to a string, parse it into a perl
structure, do the manipulation, then send it to the database to be
parsed again.  If your procedure code is mainly moving data between
tables and doing minimal intermediate heavy processing, this adds up
to a big advantage.  Which pl to go with really depends on what you
need to do.  pl/pgsql is always my first choice though.

perl and tcl are not particularly fast languages in the general case
-- you are largely at the mercy of how well the language's syntax or
library features map to the particular problem you're solving.  if you
need a fast general purpose language in the backend and are (very
understandably) skeptical about C, I'd look at pl/java.




PLV8, which is not yet ready for prime time, maps many common Postgres 
types into native JS types without the use of Input/Output functions, 
which means the conversion  is very fast. It's work which could very 
well do with repeating for the other PL's.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Query planner suggestion, for indexes with similar but not exact ordering.

2011-11-14 Thread Andrew Barnham
Hi all.  Been using postgres for years, and lurking on this newsgroup for a
short while now to help me gain the benefit of your expertise and
experience and learn how to get most out of postgresql possible.

I do a fair bit of work on tables using composite keys.  I have discovered
a couple of things lately that may or may not be worth looking at in terms
of query planner.

Consider my archetypal table. Based on real data.

table master
franchise  smallint,
partnumber varchar

It is a spare parts list, combining part numbers from multiple suppliers
(franchise). Part numbers are typically unique but sometimes there are
duplicates.  I have use cases which concern both finding parts by a
specific franchise or finding parts system wide. In my table I have follow
stats:
* Number of records : 2,343,569
* Number of unique partnumber records : 2,130,379  (i.e. for a given
partnumber there is on average, 1.1 records. i.e. a partnumber is used by
1.1 suppliers. The partnumber with the most number of records = 8 records.
* Number of unique suppliers : 35

Now consider following query: its purpose is to render next 20 rows at an
aribtrary position. The position being after record matching franchise=10,
partnumber='1' in partnumber then franchise order.

select * from master where partnum='1' and (partnum'1' or franchise10)
order by partnum,franchise limit 20;

Now if I have a composite index on partnum + franchise. This query performs
the way you would expect and very quickly.

But if I have an index on partnum only the system seqscan's master. And
yields poor performance. i.e.:

 Limit  (cost=143060.23..143060.28 rows=20 width=93) (actual
time=2307.986..2307.998 rows=20 loops=1)
   -  Sort  (cost=143060.23..148570.14 rows=2203967 width=93) (actual
time=2307.982..2307.986 rows=20 loops=1)
 Sort Key: partnum, franchise
 Sort Method:  top-N heapsort  Memory: 19kB
 -  Seq Scan on master  (cost=0.00..84413.46 rows=2203967
width=93) (actual time=0.019..1457.001 rows=2226792 loops=1)
   Filter: (((partnum)::text = '1'::text) AND
(((partnum)::text  '1'::text) OR (franchise  10)))
 Total runtime: 2308.118 ms


I wonder, if it is possible and worthwhile, to setup the query planner to
recognize that because of the stats I indicate above, that a sort by
partnum is almost exactly the same as a sort by partnum+franchise.  And
doing a Index scan on partnum index, and sorting results in memory will be
dramatically faster.  The sort buffer only needs to be very small, will
only grow to 8 records only at most in my above example.  The buffer will
scan partnum index, and as long as partnum is the same, it will sort that
small segment, as soon as the partnum increments when walking the index,
the buffer zeros out again for next sort group.

Artificially simulating this in SQL (only works with foreknowledge of max
count of records for a given part. i.e. +8 ) shows the dramatic theoretical
performance gain over the above.
explain analyze select * from (select * from master where partnum='1'
order by partnum limit 20+8) x where partnum'1' or franchise10 order by
partnum,franchise limit 20;
=
 Limit  (cost=77.71..77.75 rows=16 width=230) (actual time=0.511..0.555
rows=20 loops=1)
   -  Sort  (cost=77.71..77.75 rows=16 width=230) (actual
time=0.507..0.524 rows=20 loops=1)
 Sort Key: x.partnum, x.franchise
 Sort Method:  quicksort  Memory: 21kB
 -  Subquery Scan x  (cost=0.00..77.39 rows=16 width=230) (actual
time=0.195..0.367 rows=28 loops=1)
   Filter: (((x.partnum)::text  '1'::text) OR (x.franchise 
10))
   -  Limit  (cost=0.00..76.97 rows=28 width=93) (actual
time=0.180..0.282 rows=28 loops=1)
 -  Index Scan using master_searchpartkey on master
(cost=0.00..6134000.35 rows=2231481 width=93) (actual time=0.178..0.240
rows=28 loops=1)
   Index Cond: ((partnum)::text = '1'::text)
 Total runtime: 0.695 ms

Of course I could just make sure I create indexes with match my order by
fields perfectly; which is exactly what I am doing right now.  But I
thought that maybe it might be worth while considering looking at allowing
some sort of in memory sort to be overlaid on an index if the statistics
indicate that the sorts are very nearly ordered.

Andrew


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-06 Thread Andrew Dunstan



On 02/04/2011 02:32 AM, da...@lang.hm wrote:


when a copy command is issued, I assume that there is some indication 
of how much data is going to follow.





No of course there isn't. How would we do that with a stream like STDIN? 
Read the code.


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [HACKERS] [PERFORM] Slow count(*) again...

2011-02-03 Thread Andrew Dunstan



On 02/02/2011 07:17 PM, Greg Smith wrote:
I direct anyone who thought Mladen was making a serious comment to 
http://www.nydailynews.com/news/politics/2009/01/08/2009-01-08_misunderestimate_tops_list_of_notable_bu-3.html 
if you want to get his little joke there.  I plan to start using 
misunderestimate more in the future when talking about planner 
errors.  Might even try to slip it into the docs at some point in the 
future and see if anybody catches it.


My wings take dream ...


cheers

andrew

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best replication solution?

2009-04-07 Thread Andrew Sullivan
On Tue, Apr 07, 2009 at 10:31:02PM +1200, Mark Kirkwood wrote:

 From my experience - gained from unwittingly being in the wrong place at 
 the wrong time and so being volunteered into helping people with Slony 
 failures - it seems to be quite possible to have nodes out of sync and 
 not be entirely aware of it 

I should have stated that differently.  First, you're right that if
you don't know where to look or what to look for, you can easily be
unaware of nodes being out of sync.  What's not a problem with Slony
is that the nodes can get out of internally consistent sync state: if
you have a node that is badly lagged, at least it represents, for
sure, an actual point in time of the origin set's history.  Some of
the replication systems aren't as careful about this, and it's
possible to get the replica into a state that never happened on the
origin.  That's much worse, in my view.

In addition, it is not possible that Slony's system tables report the
replica as being up to date without them actually being so, because
the system tables are updated in the same transaction as the data is
sent.  It's hard to read those tables, however, because you have to
check every node and understand all the states.

  Complexity seems to be the major evil here.

Yes.  Slony is massively complex.

 simpler to administer. Currently it lacks a couple of features Slony has  
 (chained slaves and partial DDL support), but I'll be following its  
 development closely - because if these can be added - whilst keeping the  
 operator overhead (and the foot-gun) small, then this looks like a 
 winner.

Well, those particular features -- which are indeed the source of much
of the complexity in Slony -- were planned in from the beginning.
Londiste aimed to be simpler, so it would be interesting to see
whether those features could be incorporated without the same
complication.

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Best replication solution?

2009-04-06 Thread Andrew Sullivan
On Sun, Apr 05, 2009 at 11:36:33AM -0700, Lists wrote:

 *Slony-I* - I've used this in the past, but it's a huge pain to work  
 with, caused serious performance issues under heavy load due to long  
 running transactions (may not be the case anymore, it's been a while  
 since I used it on a large database with many writes), and doesn't seem  
 very reliable (I've had replication break on me multiple times).

It is indeed a pain to work with, but I find it hard to believe that
it is the actual source of performance issues.  What's more likely
true is that it wasn't tuned to your write load -- that _will_ cause
performance issues.  Of course, tuning it is a major pain, as
mentioned.  I'm also somewhat puzzled by the claim of unreliability:
most of the actual replication failures I've ever seen under Slony are
due to operator error (these are trivial to induce, alas --
aforementioned pain to work with again).  Slony is baroque and
confusing, but it's specifically designed to fail in safe ways (which
is not true of some of the other systems: several of them have modes
in which it's possible to have systems out of sync with each other,
but with no way to detect as much.  IMO, that's much worse, so we
designed Slony to fail noisily if it was going to fail at all).  

 *Mammoth Replicator* - This is open source now, is it any good? It  
 sounds like it's trigger based like Slony. Is it based on Slony, or  
 simply use a similar solution?

It's completely unrelated, and it doesn't use triggers.  I think the
people programming it are first-rate.  Last I looked at it, I felt a
little uncomfortable with certain design choices, which seemed to me
to be a little hacky.  They were all on the TODO list, though.

 *SkyTools/Londiste* - Don't know anything special about it.

I've been quite impressed by the usability.  It's not quite as
flexible as Slony, but it has the same theory of operation.  The
documentation is not as voluminous, although it's also much handier as
reference material than Slony's (which is, in my experience, a little
hard to navigate if you don't already know the system pretty well).

A

-- 
Andrew Sullivan
a...@crankycanuck.ca

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] inheritance, and plans

2009-02-08 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

  Andrew Gierth and...@tao11.riddles.org.uk writes:
  Type-dependent selection of operators has already been done as
  part of parse analysis, no? And the domain - base conversion is
  purely a relabelling, no? So what semantic change is possible as a
  result?

 Tom Domain conversions are *not* simply relabellings.  It's possible
 Tom now to have domain-specific functions/operators,

Right, but that's irrelevent to the planner in this case because the
resolution of precisely which operator is being called has _already
happened_ (in parse analysis), no?

 Tom It's possible that there are specific cases where the UNION
 Tom optimization checks could allow domains to be treated as their
 Tom base types,

The domain - base conversion is an important one (to anyone who uses
domains) because it happens implicitly in a wide range of contexts,
and so it's unsatisfactory for it to have major performance impacts
such as interfering with important optimizations.

 Tom but blindly smashing both sides of the check to base is going to
 Tom break more cases than it fixes.

The posted code was only looking up the base type for one side, not
both (though I don't know that code well enough to know whether it was
the correct side); the case of interest is when the subquery has the
domain type but the outer query is seeing the base type, _not_ the
reverse.

-- 
Andrew.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] inheritance, and plans

2009-02-07 Thread Andrew Gierth
 Tom == Tom Lane t...@sss.pgh.pa.us writes:

 [domain - base type conversion interfering with optimization]

 Tom You seem to be laboring under the delusion that this is
 Tom considered a bug.

Of course it's a bug, or at least a missing feature - there is no
justification for putting performance deathtraps in the way of using
domains.

 Tom It's a necessary semantic restriction, because the pushed-down
 Tom expression could mean different things when applied to different
 Tom data types.

How?

Type-dependent selection of operators has already been done as part of
parse analysis, no? And the domain - base conversion is purely a
relabelling, no? So what semantic change is possible as a result?

-- 
Andrew (irc:RhodiumToad)


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Deleting millions of rows

2009-02-03 Thread Andrew Lazarus
Hello All,


TL If you're deleting very many but not all rows, people tend
TL to drop the FK constraints and re-establish them afterwards.

I find

BEGIN;
CREATE TEMP TABLE remnant AS
  SELECT * FROM bigtable WHERE (very_restrictive_condition);
TRUNCATE TABLE bigtable;
INSERT INTO bigtable SELECT * FROM remnant;
COMMIT;
ANALYSE bigtable;

works well because there is no possibility of my forgetting FKs.


-- 
Sincerely,
 Andrew Lazarusmailto:and...@pillette.comBEGIN:VCARD
VERSION:2.1
N:Lazarus;Andrew;;;Ph.D.
FN:Andrew Lazarus, Ph.D.
EMAIL;PREF;INTERNET:and...@pillette.com
TITLE:Director of RD
ADR;WORK:;800-366-0688;3028 Fillmore Street;San Francisco;CA;94123;USA
LABEL;WORK;ENCODING=QUOTED-PRINTABLE:800-366-0688=0D=0A3028 Fillmore S=
 treet=0D=0ASan Francisco=0D=0ACA=0D=0A94123=0D=0AUSA
X-GENDER:Male
REV:18991230T08Z
END:VCARD

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Occasional Slow Commit

2008-10-28 Thread Andrew Sullivan
On Mon, Oct 27, 2008 at 05:23:37PM -0700, David Rees wrote:

 However, occasionally, processing time will jump up significantly -
 the average processing time is around 20ms with the maximum processing
 time taking 2-4 seconds for a small percentage of transactions. Ouch!
 
 Turning on statement logging and analyzing the logs of the application
 itself shows that step #4 is the culprit of the vast majority of the
 slow transactions.

My bet is that you're waiting on checkpoints.  Given that you're on
8.3, start fiddling with the checkpoint_completion_target parameter.
0.7 might help.

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Slow updates, poor IO

2008-09-26 Thread Andrew Sullivan
Hi,

On Fri, Sep 26, 2008 at 07:24:55AM +1200, John Huttley wrote:
 I've just had an interesting encounter with the slow full table update 
 problem that is inherent with MVCC

Quite apart from the other excellent observations in this thread, what
makes you think this is an MVCC issue exactly?

A

-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Choosing a filesystem

2008-09-11 Thread Andrew Sullivan
On Thu, Sep 11, 2008 at 06:29:36PM +0200, Laszlo Nagy wrote:

 The expert told me to use RAID 5 but I'm hesitating. I think that RAID 1+0 
 would be much faster, and I/O performance is what I really need.

I think you're right.  I think it's a big mistake to use RAID 5 in a
database server where you're hoping for reasonable write performance.
In theory RAID 5 ought to be fast for reads, but I've never seen it
work that way.

 I would like to put the WAL file on the SAS disks to improve performance, 
 and create one big RAID 1+0 disk for the data directory. But maybe I'm 
 completely wrong. Can you please advise how to create logical partitions? 

I would listen to yourself before you listen to the expert.  You sound
right to me :)

A


-- 
Andrew Sullivan
[EMAIL PROTECTED]
+1 503 667 4564 x104
http://www.commandprompt.com/

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


  1   2   3   4   5   >