[PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)
Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I tell rails to include those images, when querying say 50 cars, then it often decides to use a SELECT * from images WHERE car_id IN (id1,id2,id3,id4…) instead of doing a join. Now either way it uses the index I have on car_id: Index Scan using car_id_ix on adverts (cost=0.47..5665.34 rows=1224 width=234) Index Cond: (car_id = ANY ('{7097561,7253541,5159633,6674471,...}'::integer[])) But it's slow, it's very slow. In this case it took 3,323ms Can I do anything to optimize that query or maybe the index or something? The table has 16.000.000 rows -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)
On 03/05/2013 15:00, Niels Kristian Schjødt wrote: Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I tell rails to include those images, when querying say 50 cars, then it often decides to use a SELECT * from images WHERE car_id IN (id1,id2,id3,id4…) instead of doing a join. why do you want a join here ? if you don't need any cars data there is no need to JOIN that table. Now a select ... from ... where id in (id1, id2, ..., idn) isn't very scalable. Instead of passing id1, id2, ..., idn you'be better pass the condition and do a where id in (select ... ), or where exists (select 1 ... where ...), or a join, or ... Now either way it uses the index I have on car_id: Index Scan using car_id_ix on adverts (cost=0.47..5665.34 rows=1224 width=234) Index Cond: (car_id = ANY ('{7097561,7253541,5159633,6674471,...}'::integer[])) But it's slow, it's very slow. In this case it took 3,323ms 3ms isn't slow Can I do anything to optimize that query or maybe the index or something? your index is already used The table has 16.000.000 rows -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
Niels Kristian Schjødt nielskrist...@autouncle.com wrote: So my question is, should I also get something like pgpool2 setup at the same time? Is it, from your experience, likely to increase my throughput a lot more, if I had a connection pool of eg. 20 connections, instead of 300 concurrent ones directly? In my experience, it can make a big difference. If you are just using the pooler for this reason, and don't need any of the other features of pgpool, I suggest pgbouncer. It is a simpler, more lightweight tool. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
On Tue, Mar 5, 2013 at 9:34 AM, Kevin Grittner kgri...@ymail.com wrote: Niels Kristian Schjødt nielskrist...@autouncle.com wrote: So my question is, should I also get something like pgpool2 setup at the same time? Is it, from your experience, likely to increase my throughput a lot more, if I had a connection pool of eg. 20 connections, instead of 300 concurrent ones directly? In my experience, it can make a big difference. If you are just using the pooler for this reason, and don't need any of the other features of pgpool, I suggest pgbouncer. It is a simpler, more lightweight tool. I second the pgbouncer rec. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
Thanks, that was actually what I just ended up doing yesterday. Any suggestion how to tune pgbouncer? BTW, I have just bumped into an issue that caused me to disable pgbouncer again actually. My web application is querying the database with a per request based SEARCH_PATH. This is because I use schemas to provide country based separation of my data (e.g. english, german, danish data in different schemas). I have pgbouncer setup to have a transactional behavior (pool_mode = transaction) - however some of my colleagues complained that it sometimes didn't return data from the right schema set in the SEARCH_PATH - you wouldn't by chance have any idea what is going wrong wouldn't you? pgbouncer.ini [databases] production = [pgbouncer] logfile = /var/log/pgbouncer/pgbouncer.log pidfile = /var/run/pgbouncer/pgbouncer.pid listen_addr = localhost listen_port = 6432 unix_socket_dir = /var/run/postgresql auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres pool_mode = transaction server_reset_query = DISCARD ALL max_client_conn = 500 default_pool_size = 20 reserve_pool_size = 5 reserve_pool_timeout = 10 # Den 05/03/2013 kl. 17.34 skrev Kevin Grittner kgri...@ymail.com: Niels Kristian Schjødt nielskrist...@autouncle.com wrote: So my question is, should I also get something like pgpool2 setup at the same time? Is it, from your experience, likely to increase my throughput a lot more, if I had a connection pool of eg. 20 connections, instead of 300 concurrent ones directly? In my experience, it can make a big difference. If you are just using the pooler for this reason, and don't need any of the other features of pgpool, I suggest pgbouncer. It is a simpler, more lightweight tool. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
On Thu, Feb 28, 2013 at 12:13 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Could you use CLUSTER on the table after it had been closed off? If appropriate, that should make the queries run much faster, as elated entries will be in the same or nearby blocks on disk. ** ** ** ** Technically, yes. That would really help, but the issue is scheduling. Although the logs are closed off for writes, they aren’t closed off for reads, ref PG documentation: “When a table is being clustered, an ACCESS EXCLUSIVE lock is acquired on it. This prevents any other database operations (both reads and writes) from operating on the table until the CLUSTER is finished.” ** ** Not ideal, but a lot better than doing nothing at all! Since it is read only, you could make a copy of the table, cluster the copy (or just do the sorting while you make the copy), and then atomically swap the two tables by renaming them inside a single transaction. The swap process will take an exclusive lock, but it will only last for a fraction of second rather than the duration of the clustering operation. Cheers, Jeff
Re: [PERFORM] New server setup
Set it to use session. I had a similar issue having moved one of the components of our app to use transactions, which introduced an undesired behavior. -Original Message- From: pgsql-performance-ow...@postgresql.org [mailto:pgsql-performance-ow...@postgresql.org] On Behalf Of Niels Kristian Schjødt Sent: Tuesday, March 05, 2013 10:12 AM To: Kevin Grittner Cc: Craig James; pgsql-performance@postgresql.org Subject: Re: [PERFORM] New server setup Thanks, that was actually what I just ended up doing yesterday. Any suggestion how to tune pgbouncer? BTW, I have just bumped into an issue that caused me to disable pgbouncer again actually. My web application is querying the database with a per request based SEARCH_PATH. This is because I use schemas to provide country based separation of my data (e.g. english, german, danish data in different schemas). I have pgbouncer setup to have a transactional behavior (pool_mode = transaction) - however some of my colleagues complained that it sometimes didn't return data from the right schema set in the SEARCH_PATH - you wouldn't by chance have any idea what is going wrong wouldn't you? pgbouncer.ini [databases] production = [pgbouncer] logfile = /var/log/pgbouncer/pgbouncer.log pidfile = /var/run/pgbouncer/pgbouncer.pid listen_addr = localhost listen_port = 6432 unix_socket_dir = /var/run/postgresql auth_type = md5 auth_file = /etc/pgbouncer/userlist.txt admin_users = postgres pool_mode = transaction server_reset_query = DISCARD ALL max_client_conn = 500 default_pool_size = 20 reserve_pool_size = 5 reserve_pool_timeout = 10 # Den 05/03/2013 kl. 17.34 skrev Kevin Grittner kgri...@ymail.com: Niels Kristian Schjødt nielskrist...@autouncle.com wrote: So my question is, should I also get something like pgpool2 setup at the same time? Is it, from your experience, likely to increase my throughput a lot more, if I had a connection pool of eg. 20 connections, instead of 300 concurrent ones directly? In my experience, it can make a big difference. If you are just using the pooler for this reason, and don't need any of the other features of pgpool, I suggest pgbouncer. It is a simpler, more lightweight tool. -- Kevin Grittner EnterpriseDB: http://www.enterprisedb.com The Enterprise PostgreSQL Company -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] New server setup
Okay, thanks - but hey - if I put it at session pooling, then it says in the documentation: default_pool_size: In session pooling it needs to be the number of max clients you want to handle at any moment. So as I understand it, is it true that I then have to set default_pool_size to 300 if I have up to 300 client connections? And then what would the pooler then help on my performance - would that just be exactly like having the 300 clients connect directly to the database??? -NK Den 05/03/2013 kl. 19.03 skrev Benjamin Krajmalnik k...@servoyant.com:
[PERFORM] sniff test on some PG 8.4 numbers
I was hoping to just get a gut reaction on some pgbench numbers I have, to see if I'm in the ballpark. OS: ScientificLinux 6.3, x86_64 Hardware: 4x real disks (not SSD) behind an LSI 9260 in raid10, Xeon E5-2680 with hyperthreading OFF, 128GB of RAM. Setup: postgresql 8.4.13, ext4, barriers ON, disk write cache *off*, write- back enabled on the LSI. I initialized with sizes of 100, 200, and 400. I've done some tuning of the postgresql config, but mostly I'm just trying to find out if I'm in the right ballpark. I ran pgbench from another (similar) host: pgbench -h BLAH -c 32 -M prepared -t 10 -S I get 95,000 to 100,000 tps. pgbench -h BLAH -c 32 -M prepared -t 10 seems to hover around 6,200 tps (size 100) to 13,700 (size 400) Do these basically sniff right? (NOTE: with barriers off, I get a slight increase - 10% - in the read-write test, and a larger *decrease* - 15% - with the read-only test @ 400. No change @ 100) -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Are bitmap index scans slow to start?
On Wed, Feb 27, 2013 at 1:38 PM, Carlo Stonebanks stonec.regis...@sympatico.ca wrote: Is the original query you posted part of the transform process, rather than being the production query you run after the ETL is over? Neither, it is part of our auditing and maintenance processes. It is not called with any great frequency. The audit report generates rows defining how the a particular item (an “item” being a particular table/row) was created: it returns the names of the import tables, the row ids, the write operations and any transformation messages that may have been generated – all in the order they occurred. ** ** ... ** So, this query is not called often, but the fact is that if it takes over 30 seconds to load an item (because the audit report takes so long to prepare the bitmap index scan when passed new query parameters) then it severely restricts how much data we can resurrect at any one time. Is that a restriction you have observed, or are you extrapolating based on a single query? If you run a bunch of similar queries in close succession, it is likely that the first few queries will warm up the cache, and following queries will then run much faster. Also, if you restructure the series of queries into a large one that reconstructs many rows simultaneously, it might choose a more efficient path than if it is fed the queries one at a time. Cheers, Jeff
Re: [PERFORM] New server setup
On Tue, Mar 5, 2013 at 10:27 AM, Niels Kristian Schjødt nielskrist...@autouncle.com wrote: Okay, thanks - but hey - if I put it at session pooling, then it says in the documentation: default_pool_size: In session pooling it needs to be the number of max clients you want to handle at any moment. So as I understand it, is it true that I then have to set default_pool_size to 300 if I have up to 300 client connections? If those 300 client connections are all long-lived, then yes you need that many in the pool. If they are short-lived connections, then you can have a lot less as any ones over the default_pool_size will simply block until an existing connection is closed and can be re-assigned--which won't take long if they are short-lived connections. And then what would the pooler then help on my performance - would that just be exactly like having the 300 clients connect directly to the database??? It would probably be even worse than having 300 clients connected directly. There would be no point in using a pooler under those conditions. Cheers, Jeff
Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)
Hi, thanks for answering. See comments inline. Den 05/03/2013 kl. 15.26 skrev Julien Cigar jci...@ulb.ac.be: On 03/05/2013 15:00, Niels Kristian Schjødt wrote: Hi, I'm running a rails app, where I have a model called Car that has_many Images. Now when I tell rails to include those images, when querying say 50 cars, then it often decides to use a SELECT * from images WHERE car_id IN (id1,id2,id3,id4…) instead of doing a join. why do you want a join here ? if you don't need any cars data there is no need to JOIN that table. I need both Now a select ... from ... where id in (id1, id2, ..., idn) isn't very scalable. Instead of passing id1, id2, ..., idn you'be better pass the condition and do a where id in (select ... ), or where exists (select 1 ... where ...), or a join, or … I tried this now, and it doesn't seem to do a very big difference unfortunately… Now either way it uses the index I have on car_id: Index Scan using car_id_ix on adverts (cost=0.47..5665.34 rows=1224 width=234) Index Cond: (car_id = ANY ('{7097561,7253541,5159633,6674471,...}'::integer[])) But it's slow, it's very slow. In this case it took 3,323ms 3ms isn't slow Sorry, it's 3323ms! Can I do anything to optimize that query or maybe the index or something? your index is already used Okay this leaves me with - get better hardware or? The table has 16.000.000 rows -- No trees were killed in the creation of this message. However, many electrons were terribly inconvenienced. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Optimize SELECT * from table WHERE foreign_key_id IN (key1,key2,key3,key4...)
On 03/05/2013 03:51 PM, Niels Kristian Schjødt wrote: 3ms isn't slow Sorry, it's 3323ms! Can I do anything to optimize that query or maybe the index or something? your index is already used Okay this leaves me with - get better hardware or? What does explain analyze say versus just explain. JD -- Command Prompt, Inc. - http://www.commandprompt.com/ PostgreSQL Support, Training, Professional Services and Development High Availability, Oracle Conversion, Postgres-XC @cmdpromptinc - 509-416-6579 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sniff test on some PG 8.4 numbers
Do these basically sniff right? Well, the read test seems reasonable. I'm impressed by the speed of the write test ... how large is the raid card cache? And why 8.4? Can you try 9.2? (NOTE: with barriers off, I get a slight increase - 10% - in the read-write test, and a larger *decrease* - 15% - with the read-only test @ 400. No change @ 100) Oh, interesting. Can you reproduce that? I wonder what would cause read-only to drop without barriers ... -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sniff test on some PG 8.4 numbers
On Tue, Mar 5, 2013 at 7:02 PM, Josh Berkus j...@agliodbs.com wrote: Do these basically sniff right? Well, the read test seems reasonable. I'm impressed by the speed of the write test ... how large is the raid card cache? And why 8.4? Can you try 9.2? 8.4 because it's what I've got, basically. I might be able to try 9.2 later, but I'm targeting 8.4 right now. 512MB of memory on the card. (NOTE: with barriers off, I get a slight increase - 10% - in the read-write test, and a larger *decrease* - 15% - with the read-only test @ 400. No change @ 100) Oh, interesting. Can you reproduce that? I wonder what would cause read-only to drop without barriers ... I'll try to test again soon. I know that if I use writethrough instead of writeback mode the performance nosedives. Does anybody have suggestions for stripe size? (remember: *4* disks) -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] sniff test on some PG 8.4 numbers
On Tue, Mar 5, 2013 at 1:35 PM, Jon Nelson jnelson+pg...@jamponi.net wrote: pgbench -h BLAH -c 32 -M prepared -t 10 -S I get 95,000 to 100,000 tps. pgbench -h BLAH -c 32 -M prepared -t 10 seems to hover around 6,200 tps (size 100) to 13,700 (size 400) Some followup: The read test goes (up to) 133K tps, and the read-write test to 22k tps when performed over localhost. -- Jon -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance