[PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Niels Kristian Schjødt
Hi,

I'm running a rails app, where I have a model called Car that has_many Images. 
Now when I tell rails to include those images, when querying say 50 cars, then 
it often decides to use a SELECT * from images WHERE car_id IN 
(id1,id2,id3,id4…) instead of doing a join. 

Now either way it uses the index I have on car_id:

Index Scan using car_id_ix on adverts  (cost=0.47..5665.34 rows=1224 width=234)
Index Cond: (car_id = ANY 
('{7097561,7253541,5159633,6674471,...}'::integer[]))

But it's slow, it's very slow. In this case it took 3,323ms

Can I do anything to optimize that query or maybe the index or something?

The table has 16.000.000 rows

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Julien Cigar

On 03/05/2013 15:00, Niels Kristian Schjødt wrote:

Hi,

I'm running a rails app, where I have a model called Car that has_many Images. 
Now when I tell rails to include those images, when querying say 50 cars, then 
it often decides to use a SELECT * from images WHERE car_id IN 
(id1,id2,id3,id4…) instead of doing a join.


why do you want a join here ? if you don't need any cars data there is 
no need to JOIN that table.
Now a select ... from ... where id in (id1, id2, ..., idn) isn't very 
scalable.


Instead of passing id1, id2, ..., idn you'be better pass the condition 
and do a where id in (select ... ), or where exists (select 1 ... where 
...), or a join, or ...



Now either way it uses the index I have on car_id:

Index Scan using car_id_ix on adverts  (cost=0.47..5665.34 rows=1224 width=234)
Index Cond: (car_id = ANY 
('{7097561,7253541,5159633,6674471,...}'::integer[]))

But it's slow, it's very slow. In this case it took 3,323ms


3ms isn't slow


Can I do anything to optimize that query or maybe the index or something?


your index is already used


The table has 16.000.000 rows




--
No trees were killed in the creation of this message.
However, many electrons were terribly inconvenienced.



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-05 Thread Kevin Grittner
Niels Kristian Schjødt nielskrist...@autouncle.com wrote:

 So my question is, should I also get something like pgpool2 setup
 at the same time? Is it, from your experience, likely to increase
 my throughput a lot more, if I had a connection pool of eg. 20
 connections, instead of 300 concurrent ones directly?

In my experience, it can make a big difference.  If you are just
using the pooler for this reason, and don't need any of the other
features of pgpool, I suggest pgbouncer.  It is a simpler, more
lightweight tool.

-- 
Kevin Grittner
EnterpriseDB: http://www.enterprisedb.com
The Enterprise PostgreSQL Company


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-05 Thread Scott Marlowe
On Tue, Mar 5, 2013 at 9:34 AM, Kevin Grittner kgri...@ymail.com wrote:
 Niels Kristian Schjødt nielskrist...@autouncle.com wrote:

 So my question is, should I also get something like pgpool2 setup
 at the same time? Is it, from your experience, likely to increase
 my throughput a lot more, if I had a connection pool of eg. 20
 connections, instead of 300 concurrent ones directly?

 In my experience, it can make a big difference.  If you are just
 using the pooler for this reason, and don't need any of the other
 features of pgpool, I suggest pgbouncer.  It is a simpler, more
 lightweight tool.

I second the pgbouncer rec.


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-05 Thread Niels Kristian Schjødt
Thanks, that was actually what I just ended up doing yesterday. Any suggestion 
how to tune pgbouncer?

BTW, I have just bumped into an issue that caused me to disable pgbouncer again 
actually. My web application is querying the database with a per request based 
SEARCH_PATH. This is because I use schemas to provide country based separation 
of my data (e.g. english, german, danish data in different schemas). I have 
pgbouncer setup to have a transactional behavior (pool_mode = transaction) - 
however some of my colleagues complained that it sometimes didn't return data 
from the right schema set in the SEARCH_PATH - you wouldn't by chance have any 
idea what is going wrong wouldn't you?

 pgbouncer.ini
[databases]
production =

[pgbouncer]

logfile = /var/log/pgbouncer/pgbouncer.log
pidfile = /var/run/pgbouncer/pgbouncer.pid
listen_addr = localhost
listen_port = 6432
unix_socket_dir = /var/run/postgresql
auth_type = md5
auth_file = /etc/pgbouncer/userlist.txt
admin_users = postgres
pool_mode = transaction
server_reset_query = DISCARD ALL
max_client_conn = 500
default_pool_size = 20
reserve_pool_size = 5
reserve_pool_timeout = 10
#


Den 05/03/2013 kl. 17.34 skrev Kevin Grittner kgri...@ymail.com:

 Niels Kristian Schjødt nielskrist...@autouncle.com wrote:
 
 So my question is, should I also get something like pgpool2 setup
 at the same time? Is it, from your experience, likely to increase
 my throughput a lot more, if I had a connection pool of eg. 20
 connections, instead of 300 concurrent ones directly?
 
 In my experience, it can make a big difference.  If you are just
 using the pooler for this reason, and don't need any of the other
 features of pgpool, I suggest pgbouncer.  It is a simpler, more
 lightweight tool.
 
 -- 
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com
 The Enterprise PostgreSQL Company



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-03-05 Thread Jeff Janes
On Thu, Feb 28, 2013 at 12:13 PM, Carlo Stonebanks 
stonec.regis...@sympatico.ca wrote:

 Could you use CLUSTER on the table after it had been closed off?  If
 appropriate, that should make the queries run much faster, as elated
 entries will be in the same or nearby blocks on disk.

 ** **

 ** **

 Technically, yes. That would really help, but the issue is scheduling.
 Although the logs are closed off for writes, they aren’t closed off for
 reads, ref PG documentation: “When a table is being clustered, an ACCESS
 EXCLUSIVE lock is acquired on it. This prevents any other database
 operations (both reads and writes) from operating on the table until the
 CLUSTER is finished.”

 ** **

 Not ideal, but a lot better than doing nothing at all!


Since it is read only, you could make a copy of the table, cluster the copy
(or just do the sorting while you make the copy), and then atomically swap
the two tables by renaming them inside a single transaction.

The swap process will take an exclusive lock, but it will only last for a
fraction of second rather than the duration of the clustering operation.

Cheers,

Jeff


Re: [PERFORM] New server setup

2013-03-05 Thread Benjamin Krajmalnik
Set it to use session.  I had a similar issue having moved one of the 
components of our app to use transactions, which introduced an undesired 
behavior.


-Original Message-
From: pgsql-performance-ow...@postgresql.org 
[mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Niels Kristian 
Schjødt
Sent: Tuesday, March 05, 2013 10:12 AM
To: Kevin Grittner
Cc: Craig James; pgsql-performance@postgresql.org
Subject: Re: [PERFORM] New server setup

Thanks, that was actually what I just ended up doing yesterday. Any suggestion 
how to tune pgbouncer?

BTW, I have just bumped into an issue that caused me to disable pgbouncer again 
actually. My web application is querying the database with a per request based 
SEARCH_PATH. This is because I use schemas to provide country based separation 
of my data (e.g. english, german, danish data in different schemas). I have 
pgbouncer setup to have a transactional behavior (pool_mode = transaction) - 
however some of my colleagues complained that it sometimes didn't return data 
from the right schema set in the SEARCH_PATH - you wouldn't by chance have any 
idea what is going wrong wouldn't you?

 pgbouncer.ini
[databases]
production =

[pgbouncer]

logfile = /var/log/pgbouncer/pgbouncer.log pidfile = 
/var/run/pgbouncer/pgbouncer.pid listen_addr = localhost listen_port = 6432 
unix_socket_dir = /var/run/postgresql auth_type = md5 auth_file = 
/etc/pgbouncer/userlist.txt admin_users = postgres pool_mode = transaction 
server_reset_query = DISCARD ALL max_client_conn = 500 default_pool_size = 20 
reserve_pool_size = 5 reserve_pool_timeout = 10 #


Den 05/03/2013 kl. 17.34 skrev Kevin Grittner kgri...@ymail.com:

 Niels Kristian Schjødt nielskrist...@autouncle.com wrote:
 
 So my question is, should I also get something like pgpool2 setup at 
 the same time? Is it, from your experience, likely to increase my 
 throughput a lot more, if I had a connection pool of eg. 20 
 connections, instead of 300 concurrent ones directly?
 
 In my experience, it can make a big difference.  If you are just using 
 the pooler for this reason, and don't need any of the other features 
 of pgpool, I suggest pgbouncer.  It is a simpler, more lightweight 
 tool.
 
 --
 Kevin Grittner
 EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL 
 Company



--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] New server setup

2013-03-05 Thread Niels Kristian Schjødt
Okay, thanks - but hey - if I put it at session pooling, then it says in the 
documentation: default_pool_size: In session pooling it needs to be the number 
of max clients you want to handle at any moment. So as I understand it, is it 
true that I then have to set default_pool_size to 300 if I have up to 300 
client connections? And then what would the pooler then help on my performance 
- would that just be exactly like having the 300 clients connect directly to 
the database???

-NK


Den 05/03/2013 kl. 19.03 skrev Benjamin Krajmalnik k...@servoyant.com:

  



[PERFORM] sniff test on some PG 8.4 numbers

2013-03-05 Thread Jon Nelson
I was hoping to just get a gut reaction on some pgbench numbers I have, to
see if I'm in the ballpark.

OS:  ScientificLinux 6.3, x86_64
Hardware: 4x real disks (not SSD) behind an LSI 9260 in raid10, Xeon E5-2680
with hyperthreading OFF, 128GB of RAM.
Setup: postgresql 8.4.13, ext4, barriers ON, disk write cache *off*, write-
back enabled on the LSI.
I initialized with sizes of 100, 200, and 400.

I've done some tuning of the postgresql config, but mostly I'm just trying to
find out if I'm in the right ballpark.

I ran pgbench from another (similar) host:

pgbench -h BLAH -c 32 -M prepared -t 10 -S
I get 95,000 to 100,000 tps.

pgbench -h BLAH -c 32 -M prepared -t 10
seems to hover around 6,200 tps (size 100) to 13,700 (size 400)

Do these basically sniff right?
(NOTE: with barriers off, I get a slight increase - 10% - in the
read-write test, and a larger *decrease* - 15% - with the read-only
test @ 400. No change @ 100)

-- 
Jon


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Are bitmap index scans slow to start?

2013-03-05 Thread Jeff Janes
On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks 
stonec.regis...@sympatico.ca wrote:



  Is the original query you posted part of the transform process, rather
 than being the production query you run after the ETL is over?

 

 Neither, it is part of our auditing and maintenance processes. It is not
 called with any great frequency. The audit report generates rows defining
 how the a particular  item (an “item” being a particular table/row) was
 created: it returns the names of the import tables, the row ids, the write
 operations and any transformation messages that may have been generated –
 all in the order they occurred.

 ** **


...


 **

 So, this query is not called often, but the fact is that if it takes over
 30 seconds to load an item (because the audit report takes so long to
 prepare the bitmap index scan when passed new query parameters) then it
 severely restricts how much data we can resurrect at any one time.


Is that a restriction you have observed, or are you extrapolating based on
a single query?  If you run a bunch of similar queries in close succession,
it is likely that the first few queries will warm up the cache, and
following queries will then run much faster.  Also, if you restructure the
series of queries into a large one that reconstructs many rows
simultaneously, it might choose a more efficient path than if it is fed the
queries one at a time.

Cheers,

Jeff


Re: [PERFORM] New server setup

2013-03-05 Thread Jeff Janes
On Tue, Mar 5, 2013 at 10:27 AM, Niels Kristian Schjødt 
nielskrist...@autouncle.com wrote:

 Okay, thanks - but hey - if I put it at session pooling, then it says in
 the documentation: default_pool_size: In session pooling it needs to be
 the number of max clients you want to handle at any moment. So as I
 understand it, is it true that I then have to set default_pool_size to 300
 if I have up to 300 client connections?


If those 300 client connections are all long-lived, then yes you need that
many in the pool.  If they are short-lived connections, then you can have a
lot less as any ones over the default_pool_size will simply block until an
existing connection is closed and can be re-assigned--which won't take long
if they are short-lived connections.


And then what would the pooler then help on my performance - would that
 just be exactly like having the 300 clients connect directly to the
 database???


It would probably be even worse than having 300 clients connected
directly.  There would be no point in using a pooler under those conditions.


Cheers,

Jeff


Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Niels Kristian Schjødt
Hi, thanks for answering. See comments inline.

Den 05/03/2013 kl. 15.26 skrev Julien Cigar jci...@ulb.ac.be:

 On 03/05/2013 15:00, Niels Kristian Schjødt wrote:
 Hi,
 
 I'm running a rails app, where I have a model called Car that has_many 
 Images. Now when I tell rails to include those images, when querying say 50 
 cars, then it often decides to use a SELECT * from images WHERE car_id IN 
 (id1,id2,id3,id4…) instead of doing a join.
 
 why do you want a join here ? if you don't need any cars data there is no 
 need to JOIN that table.
I need both
 Now a select ... from ... where id in (id1, id2, ..., idn) isn't very 
 scalable.
 
 Instead of passing id1, id2, ..., idn you'be better pass the condition and do 
 a where id in (select ... ), or where exists (select 1 ... where ...), or a 
 join, or …
 
I tried this now, and it doesn't seem to do a very big difference unfortunately…

 Now either way it uses the index I
 have on car_id:
 
 Index Scan using car_id_ix on adverts  (cost=0.47..5665.34 rows=1224 
 width=234)
  Index Cond: (car_id = ANY 
 ('{7097561,7253541,5159633,6674471,...}'::integer[]))
 
 But it's slow, it's very slow. In this case it took 3,323ms
 
 3ms isn't slow
 
Sorry, it's 3323ms!

 Can I do anything to optimize that query or maybe the index or something?
 
 your index is already used

Okay this leaves me with - get better hardware or?

 
 The table has 16.000.000 rows
 
 
 
 -- 
 No trees were killed in the creation of this message.
 However, many electrons were terribly inconvenienced.
 
 
 
 -- 
 Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
 To make changes to your subscription:
 http://www.postgresql.org/mailpref/pgsql-performance



-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)

2013-03-05 Thread Joshua D. Drake


On 03/05/2013 03:51 PM, Niels Kristian Schjødt wrote:


3ms isn't slow


Sorry, it's 3323ms!


Can I do anything to optimize that query or maybe the index or something?


your index is already used


Okay this leaves me with - get better hardware or?


What does explain analyze say versus just explain.

JD
--
Command Prompt, Inc. - http://www.commandprompt.com/
PostgreSQL Support, Training, Professional Services and Development
High Availability, Oracle Conversion, Postgres-XC
@cmdpromptinc - 509-416-6579


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] sniff test on some PG 8.4 numbers

2013-03-05 Thread Josh Berkus

 Do these basically sniff right?

Well, the read test seems reasonable.  I'm impressed by the speed of the
write test ... how large is the raid card cache?

And why 8.4?  Can you try 9.2?

 (NOTE: with barriers off, I get a slight increase - 10% - in the
 read-write test, and a larger *decrease* - 15% - with the read-only
 test @ 400. No change @ 100)

Oh, interesting.  Can you reproduce that? I wonder what would cause
read-only to drop without barriers ...

-- 
Josh Berkus
PostgreSQL Experts Inc.
http://pgexperts.com


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] sniff test on some PG 8.4 numbers

2013-03-05 Thread Jon Nelson
On Tue, Mar 5, 2013 at 7:02 PM, Josh Berkus j...@agliodbs.com wrote:

 Do these basically sniff right?

 Well, the read test seems reasonable.  I'm impressed by the speed of the
 write test ... how large is the raid card cache?

 And why 8.4?  Can you try 9.2?

8.4 because it's what I've got, basically. I might be able to try 9.2
later, but I'm targeting 8.4 right now.
512MB of memory on the card.

 (NOTE: with barriers off, I get a slight increase - 10% - in the
 read-write test, and a larger *decrease* - 15% - with the read-only
 test @ 400. No change @ 100)

 Oh, interesting.  Can you reproduce that? I wonder what would cause
 read-only to drop without barriers ...

I'll try to test again soon.
I know that if I use writethrough instead of writeback mode the
performance nosedives.
Does anybody have suggestions for stripe size? (remember: *4* disks)

-- 
Jon


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] sniff test on some PG 8.4 numbers

2013-03-05 Thread Jon Nelson
On Tue, Mar 5, 2013 at 1:35 PM, Jon Nelson jnelson+pg...@jamponi.net wrote:

 pgbench -h BLAH -c 32 -M prepared -t 10 -S
 I get 95,000 to 100,000 tps.

 pgbench -h BLAH -c 32 -M prepared -t 10
 seems to hover around 6,200 tps (size 100) to 13,700 (size 400)

Some followup:
The read test goes (up to) 133K tps, and the read-write test to 22k
tps when performed over localhost.

-- 
Jon


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance