Re: [PERFORM] Query improvement
Thanks for replies. Finally I have used UNION and JOINS, which helped. Mainly the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378163.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] indexes ignored when querying the master table
i have around 25mio records of data distributed yearly over 9 child tables (data.logs_20xx) that inherit from the master table data.logs. the tables are partitioned using the field re_timestamp, which has btree indexes defined on all tables. the query SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT 100 does use seq scans on all tables instead of using the existing indexes which takes ages. when issuing the the same query to one of the child tables directly (SELECT * FROM data.logs_2011 ORDER BY re_timestamp DESC LIMIT 100) the index is used as expected and the data returned quickly. how can i get postgres to use the indexes when querying the master table? please find below the EXPLAIN ANALYZE output for both queries on my development machine (pgsql 9.0 x64 on windows 7). thanks in advance, thomas EXPLAIN ANALYZE SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT 100; Limit (cost=6331255.90..6331256.15 rows=100 width=1388) (actual time=1592287.794..1592287.808 rows=100 loops=1) - Sort (cost=6331255.90..6395928.37 rows=25868986 width=1388) (actual time=1592287.789..1592287.796 rows=100 loops=1) Sort Key: data.logs.re_timestamp Sort Method: top-N heapsort Memory: 217kB - Result (cost=0.00..5342561.86 rows=25868986 width=1388) (actual time=0.026..1466420.868 rows=25870101 loops=1) - Append (cost=0.00..5342561.86 rows=25868986 width=1388) (actual time=0.020..1417490.892 rows=25870101 loops=1) - Seq Scan on logs (cost=0.00..10.40 rows=40 width=1776) (actual time=0.002..0.002 rows=0 loops=1) - Seq Scan on logs_2011 logs (cost=0.00..195428.00 rows=904800 width=1449) (actual time=0.017..92381.769 rows=904401 loops=1) - Seq Scan on logs_2010 logs (cost=0.00..759610.67 rows=3578567 width=1426) (actual time=23.996..257612.143 rows=3579586 loops=1) - Seq Scan on logs_2009 logs (cost=0.00..841998.35 rows=3987235 width=1423) (actual time=12.921..200385.903 rows=3986473 loops=1) - Seq Scan on logs_2008 logs (cost=0.00..942810.60 rows=4409860 width=1444) (actual time=18.861..226867.499 rows=4406653 loops=1) - Seq Scan on logs_2007 logs (cost=0.00..730863.69 rows=3600569 width=1359) (actual time=14.406..174082.413 rows=3603739 loops=1) - Seq Scan on logs_2006 logs (cost=0.00..620978.29 rows=3089929 width=1348) (actual time=21.647..147244.677 rows=3091214 loops=1) - Seq Scan on logs_2005 logs (cost=0.00..486928.59 rows=2440959 width=1342) (actual time=0.005..126479.314 rows=2438968 loops=1) - Seq Scan on logs_2004 logs (cost=0.00..402991.92 rows=2031092 width=1327) (actual time=23.007..98531.883 rows=2034041 loops=1) - Seq Scan on logs_2003 logs (cost=0.00..360941.35 rows=1825935 width=1325) (actual time=20.220..91773.705 rows=1825026 loops=1) Total runtime: 1592293.267 ms EXPLAIN ANALYZE SELECT * FROM data.logs_2011 ORDER BY re_timestamp DESC LIMIT 100; Limit (cost=0.00..22.65 rows=100 width=1449) (actual time=59.161..60.226 rows=100 loops=1) - Index Scan Backward using logs_fts_2011_timestamp_idx on logs_2011 (cost=0.00..204919.30 rows=904800 width=1449) (actual time=59.158..60.215 rows=100 loops=1) Total runtime: 60.316 ms -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
Thanks for reply both UNION and JOINS helped. Mainly the UNION helped a lot. Now the query takes 1sec max. Thanks a lot. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378157.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Query improvement
Thanks a lot for reply. Finally I have used UNION, but thanks for your help. -- View this message in context: http://postgresql.1045698.n5.nabble.com/Query-improvement-tp4362578p4378160.html Sent from the PostgreSQL - performance mailing list archive at Nabble.com. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexes ignored when querying the master table
* Thomas Hägi: how can i get postgres to use the indexes when querying the master table? I believe that this is a new feature in PostgreSQL 9.1 (Allow inheritance table queries to return meaningfully-sorted results). -- Florian Weimerfwei...@bfk.de BFK edv-consulting GmbH http://www.bfk.de/ Kriegsstraße 100 tel: +49-721-96201-1 D-76133 Karlsruhe fax: +49-721-96201-99 -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] indexes ignored when querying the master table
On 05/06/2011 05:13 PM, Thomas Hägi wrote: the query SELECT * FROM data.logs ORDER BY re_timestamp DESC LIMIT 100 does use seq scans on all tables instead of using the existing indexes which takes ages. when issuing the the same query to one of the child tables directly (SELECT * FROM data.logs_2011 ORDER BY re_timestamp DESC LIMIT 100) the index is used as expected and the data returned quickly. Let's see, cut and paste http://archives.postgresql.org/message-id/4db8ce7d.8030...@2ndquadrant.com and: This is probably the limitation that's fixed in PostgreSQL 9.1 by this commit (following a few others leading up to it): http://archives.postgresql.org/pgsql-committers/2010-11/msg00028.php There was a good example showing what didn't work as expected before (along with an earlier patch that didn't everything the larger 9.1 improvement does) at http://archives.postgresql.org/pgsql-hackers/2009-07/msg01115.php ; ORDER BY x DESC LIMIT 1 returns the same things as MAX(x). It's a pretty serious issue with the partitioning in earlier versions. I know of multiple people, myself included, who have been compelled to apply this change to an earlier version of PostgreSQL to make larger partitioned databases work correctly. The other option is to manually decompose the queries into ones that target each of the child tables individually, then combine the results, which is no fun either. (Am thinking about a documentation backpatch pointing out this limitation) -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] wildcard makes seq scan on prod db but not in test
Dear list, I have a table with a few million rows and this index: CREATE INDEX bond_item_common_x7 ON bond_item_common ((lower(original_filename))); There are about 2M rows on bonddump and 4M rows on bond90. bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine. The table is analyzed properly both places. I'm an index hint zealot, but aware of our different stances in the matter. :) Dropping the wildcard for the like, both databases uses the index. Is there a way to convince Postgres to try not to do full table scan as much? This is just one of several examples when it happily spends lots of time sequentially going thru tables. Thanks, Marcus psql (9.0.4) Type help for help. bonddump=# explain analyze select pic2.objectid bonddump-# from bond_item_common pic2 bonddump-# where bonddump-# lower(pic2.original_filename) like 'this is a test%' ; QUERY PLAN --- Index Scan using bond_item_common_x7 on bond_item_common pic2 (cost=0.01..8.69 rows=208 width=4) (actual time=26.415..26.415 rows=0 loops=1) Index Cond: ((lower((original_filename)::text) = 'this is a test'::text) AND (lower((original_filename)::text) 'this is a tesu'::text)) Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text) Total runtime: 26.519 ms (4 rows) psql (9.0.4) bond90= explain analyze select pic2.objectid bond90- from bond_item_common pic2 bond90- where bond90- lower(pic2.original_filename) like 'this is a test%' ; QUERY PLAN -- Seq Scan on bond_item_common pic2 (cost=0.00..839226.81 rows=475 width=4) (actual time=10599.401..10599.401 rows=0 loops=1) Filter: (lower((original_filename)::text) ~~ 'this is a test%'::text) Total runtime: 10599.425 ms (3 rows) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] wildcard makes seq scan on prod db but not in test
Marcus Engene meng...@engene.se writes: There are about 2M rows on bonddump and 4M rows on bond90. bonddump is on a 8MB RAM machine, bond90 is on a 72MB RAM machine. The table is analyzed properly both places. I'll bet one database was initialized in C locale and the other not. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] wildcard makes seq scan on prod db but not in test
Marcus Engene meng...@engene.se wrote: I have a table with a few million rows and this index: CREATE INDEX bond_item_common_x7 ON bond_item_common ((lower(original_filename))); Dropping the wildcard for the like, both databases uses the index. Is there a way to convince Postgres to try not to do full table scan as much? That could be a difference is collations. What do you get from the query on this page for each database?: http://wiki.postgresql.org/wiki/Server_Configuration -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] wildcard makes seq scan on prod db but not in test
On 5/9/11 8:57 , Kevin Grittner wrote: That could be a difference is collations. What do you get from the query on this page for each database?: http://wiki.postgresql.org/wiki/Server_Configuration -Kevin There's indeed a different collation. Why is this affecting? Can i force a column to be ascii? The (fast) test server: version | PostgreSQL 9.0.4 on x86_64-apple-darwin10.7.0, compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc. build 5664), 64-bit effective_cache_size | 512MB lc_collate | C lc_ctype | UTF-8 maintenance_work_mem | 128MB max_connections | 100 max_stack_depth | 2MB port | 5435 server_encoding | UTF8 shared_buffers | 512MB temp_buffers | 8192 TimeZone | Europe/Zurich wal_buffers | 1MB work_mem | 128MB (14 rows) The (slow) production server: version | PostgreSQL 9.0.4 on x86_64-unknown-linux-gnu, compiled by GCC gcc (Debian 4.3.2-1.1) 4.3.2, 64-bit checkpoint_completion_target | 0.9 checkpoint_segments | 64 effective_cache_size | 48GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | localhost,10.0.0.3,74.50.57.76 maintenance_work_mem | 1GB max_connections | 600 max_stack_depth | 2MB port | 5435 server_encoding | UTF8 shared_buffers | 8GB temp_buffers | 32768 TimeZone | UTC work_mem | 128MB (16 rows) Thanks, Marcus -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] wildcard makes seq scan on prod db but not in test
Marcus Engene meng...@engene.se wrote: On 5/9/11 8:57 , Kevin Grittner wrote: That could be a difference is collations. What do you get from the query on this page for each database?: http://wiki.postgresql.org/wiki/Server_Configuration There's indeed a different collation. Why is this affecting? If the index isn't sorted in an order which leaves the rows you are requesting near one another, it's not very useful for the query. Try this query on both: create temp table order_example (val text); insert into order_example values ('a z'),('ab'),('123'),(' 456'); select * from order_example order by val; Can i force a column to be ascii? You don't need to do that; you can specify an opclass for the index to tell it that you don't want to order by the normal collation, but rather in a way which will allow the index to be useful for pattern matching: http://www.postgresql.org/docs/9.0/interactive/indexes-opclass.html The (fast) test server: effective_cache_size | 512MB lc_collate | C lc_ctype | UTF-8 maintenance_work_mem | 128MB max_connections | 100 server_encoding | UTF8 shared_buffers | 512MB temp_buffers | 8192 TimeZone | Europe/Zurich wal_buffers | 1MB The (slow) production server: checkpoint_completion_target | 0.9 checkpoint_segments | 64 effective_cache_size | 48GB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | localhost,10.0.0.3,74.50.57.76 maintenance_work_mem | 1GB max_connections | 600 server_encoding | UTF8 shared_buffers | 8GB temp_buffers | 32768 TimeZone | UTC As you've discovered, with that many differences, performance tests on one machine may have very little to do with actual performance on the other. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] Benchmarking a large server
I've got a fun problem. My employer just purchased some new db servers that are very large. The specs on them are: 4 Intel X7550 CPU's (32 physical cores, HT turned off) 1 TB Ram 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) 3TB Sas Array (48 15K 146GB spindles) The issue we are running into is how do we benchmark this server, specifically, how do we get valid benchmarks for the Fusion IO card? Normally to eliminate the cache effect, you run iozone and other benchmark suites at 2x the ram. However, we can't do that due to 2TB 1.3TB. So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? Thanks, Chris
[PERFORM] good performance benchmark
I'm looking for a good ready-to-run broad spectrum (tests cpu bound, i/o bound, various cases, various sizes) benchmark. I tried dbt5 and got it compiled after some effort but it immediately fails upon running so I punted. Anybody have any ideas where I could look? merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On Mon, May 9, 2011 at 3:32 PM, Chris Hoover revo...@gmail.com wrote: I've got a fun problem. My employer just purchased some new db servers that are very large. The specs on them are: 4 Intel X7550 CPU's (32 physical cores, HT turned off) 1 TB Ram 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) 3TB Sas Array (48 15K 146GB spindles) my GOODNESS! :-D. I mean, just, wow. The issue we are running into is how do we benchmark this server, specifically, how do we get valid benchmarks for the Fusion IO card? Normally to eliminate the cache effect, you run iozone and other benchmark suites at 2x the ram. However, we can't do that due to 2TB 1.3TB. So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? hm, if it was me, I'd write a small C program that just jumped directly on the device around and did random writes assuming it wasn't formatted. For sequential read, just flush caches and dd the device to /dev/null. Probably someone will suggest better tools though. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
hm, if it was me, I'd write a small C program that just jumped directly on the device around and did random writes assuming it wasn't formatted. For sequential read, just flush caches and dd the device to /dev/null. Probably someone will suggest better tools though. I have a program I wrote years ago for a purpose like this. One of the things it can do is write to the filesystem at the same time as dirtying pages in a large shared or non-shared memory region. The idea was to emulate the behavior of a database reasonably accurately. Something like bonnie++ would probably be a good starting point these days though. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On May 9, 2011, at 1:32 PM, Chris Hoover wrote: 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) Be careful here. What if the entire card hiccups, instead of just a device on it? (We've had that happen to us before.) Depending on how you've done your raid 10, either all your parity is gone or your data is. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On 05/09/2011 03:32 PM, Chris Hoover wrote: So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? We had a similar problem when benching our FusionIO setup. What I did was write a script that cleared out the Linux system cache before every iteration of our pgbench tests. You can do that easily with: echo 3 /proc/sys/vm/drop_caches Executed as root. Then we ran short (10, 20, 30, 40 clients, 10,000 transactions each) pgbench tests, resetting the cache and the DB after every iteration. It was all automated in a script, so it wasn't too much work. We got (roughly) a 15x speed improvement over a 6x15k RPM RAID-10 setup on the same server, with no other changes. This was definitely corroborated after deployment, when our frequent periods of 100% disk IO utilization vanished and were replaced by occasional 20-30% spikes. Even that's an unfair comparison in favor of the RAID, because we added DRBD to the mix because you can't share a PCI card between two servers. If you do have two 1.3TB Duo cards in a 4x640GB RAID-10, you should get even better read times than we did. -- Shaun Thomas OptionsHouse | 141 W. Jackson Blvd. | Suite 800 | Chicago IL, 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On Mon, May 9, 2011 at 3:59 PM, David Boreham david_l...@boreham.org wrote: hm, if it was me, I'd write a small C program that just jumped directly on the device around and did random writes assuming it wasn't formatted. For sequential read, just flush caches and dd the device to /dev/null. Probably someone will suggest better tools though. I have a program I wrote years ago for a purpose like this. One of the things it can do is write to the filesystem at the same time as dirtying pages in a large shared or non-shared memory region. The idea was to emulate the behavior of a database reasonably accurately. Something like bonnie++ would probably be a good starting point these days though. The problem with bonnie++ is that the results aren't valid, especially the read tests. I think it refuses to even run unless you set special switches. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On 5/9/2011 3:11 PM, Merlin Moncure wrote: The problem with bonnie++ is that the results aren't valid, especially the read tests. I think it refuses to even run unless you set special switches. I only care about writes ;) But definitely, be careful with the tools. I tend to prefer small programs written in house myself, and of course simply running your application under a synthesized load. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
Aren Cambre a...@arencambre.com wrote: Postgres won't use more than 1 CPU core. One *connection* to PostgreSQL won't directly use more than one core. As Merlin suggests, perhaps you're really only running one query at a time? The other possibility is that you're somehow acquiring locks which cause one process to block others. - Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading Most benchmarks I've seen comparing hyper-threading show that PostgreSQL performs better if you don't try to convince it that one core is actually two different cores. With HT on, you tend to see context switching storms, and performance suffers. At first, the app pounds all 8 cores. You really shouldn't let the marketers get to you like that. You have four cores, not eight. The most important information for finding your bottleneck is probably going to be in pg_stat_activity and pg_locks. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
Are you sure you are really using 1 connection? While your test is running, log onto postgres with psql and grab the output of pg_stat_activity a few times. What do you see? Thanks. If a connection corresponds to a process, then this suggests I am using 1 connection for my main thread, and all the threads it spawns are sharing another connection. Aren
Re: [PERFORM] Postgres refusing to use 1 core
Postgres won't use more than 1 CPU core. One *connection* to PostgreSQL won't directly use more than one core. As Merlin suggests, perhaps you're really only running one query at a time? The other possibility is that you're somehow acquiring locks which cause one process to block others. The one connection theory appears correct per prior email, if correctly understood what I was reading. I guess I need to head back over to the Npgsql folks and see what I am doing wrong? - Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading Most benchmarks I've seen comparing hyper-threading show that PostgreSQL performs better if you don't try to convince it that one core is actually two different cores. With HT on, you tend to see context switching storms, and performance suffers. At first, the app pounds all 8 cores. You really shouldn't let the marketers get to you like that. You have four cores, not eight. I agree. :-) Just trying to express things as my OS sees and reports on them. Aren
Re: [PERFORM] Postgres refusing to use 1 core
On Mon, May 9, 2011 at 4:50 PM, Aren Cambre a...@arencambre.com wrote: Are you sure you are really using 1 connection? While your test is running, log onto postgres with psql and grab the output of pg_stat_activity a few times. What do you see? Thanks. If a connection corresponds to a process, then this suggests I am using 1 connection for my main thread, and all the threads it spawns are sharing another connection. Yes. However I can tell you with absolute certainly that postgres will distribute work across cores. Actually the o/s does it -- each unique connection spawns a single threaded process on the backend. As long as your o/s of choice is supports using more than once process at once, your work will distribute. So, given that, your problem is: *) your code is actually using only one connection *) you have contention on the server side (say, a transaction outstanding that it blocking everyone) *) you have contention on the client side -- a lock in your code or inside npgsql *) your measuring is not correct. so follow the advice above. we need to see pg_stat_activity, and/or pg_locks while your test is running (especially take note of pg_lock records with granted=f) merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
Aren Cambre a...@arencambre.com wrote: - Core i7 processor--4 physical cores, but OS sees 8 cores via hyper-threading Most benchmarks I've seen comparing hyper-threading show that PostgreSQL performs better if you don't try to convince it that one core is actually two different cores. With HT on, you tend to see context switching storms, and performance suffers. At first, the app pounds all 8 cores. You really shouldn't let the marketers get to you like that. You have four cores, not eight. I agree. :-) Just trying to express things as my OS sees and reports on them. Your OS won't *see* eight processors if you turn of HT. :-) I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. -Kevin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On 05/09/2011 04:32 PM, Chris Hoover wrote: So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? If you do the Linux trick to drop its caches already mentioned, you can start a database test with zero information in memory. In that situation, whether or not everything could fit in RAM doesn't matter as much; you're starting with none of it in there. In that case, you can benchmark things without having twice as much disk space. You just have to recognize that the test become less useful the longer you run it, and measure the results accordingly. A test starting from that state will start out showing you random I/O speed on the device, slowing moving toward in-memory cached speeds as the benchmark runs for a while. You really need to capture the latency data for every transaction and graph it over time to make any sense of it. If you look at Using and Abusing pgbench at http://projects.2ndquadrant.com/talks , starting on P33 I have several slides showing such a test, done with pgbench and pgbench-tools. I added a quick hack to pgbench-tools around then to make it easier to run this specific type of test, but to my knowledge no one else has ever used it. (I've had talks about PostgreSQL in my yard that were better attended than that session, for which I blame Jonah Harris for doing a great talk in the room next door concurrent with it.) -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
On 05/09/2011 05:59 PM, Kevin Grittner wrote: I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. You're using connection pooling quite aggressively though. The sort of people who do actually benefit from hyperthreading are the ones who don't, where there's lots of CPU time being burnt up in overhead you don't see, and that even a virtual HT processor can help handle. I'm not a big fan of the current hyperthreading implementation, but it's not nearly as bad as the older ones, and there are situations where it is useful. I am unsurprised you don't ever see them on your workload though, you're well tweaked enough to probably be memory or disk limited much of the time. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] good performance benchmark
On 5/9/11 1:41 PM, Merlin Moncure wrote: I'm looking for a good ready-to-run broad spectrum (tests cpu bound, i/o bound, various cases, various sizes) benchmark. I tried dbt5 and got it compiled after some effort but it immediately fails upon running so I punted. Anybody have any ideas where I could look? I don't know any real benchmark currently that isn't fairly involved to set up. As in, week-long debugging session. I wish it were different, but to date nobody is available to put in the kind of work required to have credible benchmarks which are relatively portable. DBT2 is a bit more stable than DBT5, though, so you might have a better time with it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres 9.0.4 + Hot Standby + FusionIO Drive + Performance = Query failed ERROR: catalog is missing 1 attribute(s) for relid 172226
On 5/5/11 12:47 AM, Sethu Prasad wrote: I did the hot standby configured earlier and at that time I started using(querying) the standby database. May be something missed on the archive command. Most likely, yes. PostgreSQL shouldn't start up under such circumstances, but apparently you fooled it. -- Josh Berkus PostgreSQL Experts Inc. http://pgexperts.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
2011/5/9 Chris Hoover revo...@gmail.com: I've got a fun problem. My employer just purchased some new db servers that are very large. The specs on them are: 4 Intel X7550 CPU's (32 physical cores, HT turned off) 1 TB Ram 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) 3TB Sas Array (48 15K 146GB spindles) The issue we are running into is how do we benchmark this server, specifically, how do we get valid benchmarks for the Fusion IO card? Normally to eliminate the cache effect, you run iozone and other benchmark suites at 2x the ram. However, we can't do that due to 2TB 1.3TB. So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? You can reduce the memory size on server boot. If you use linux, you can add a 'mem=512G' to your boot time parameters. (maybe it supports only K or M, so 512*1024...) Thanks, Chris -- Cédric Villemain 2ndQuadrant http://2ndQuadrant.fr/ PostgreSQL : Expertise, Formation et Support -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
2011/5/9 Chris Hooverrevo...@gmail.com: I've got a fun problem. My employer just purchased some new db servers that are very large. The specs on them are: 4 Intel X7550 CPU's (32 physical cores, HT turned off) 1 TB Ram 1.3 TB Fusion IO (2 1.3 TB Fusion IO Duo cards in a raid 10) 3TB Sas Array (48 15K 146GB spindles) The issue we are running into is how do we benchmark this server, specifically, how do we get valid benchmarks for the Fusion IO card? Normally to eliminate the cache effect, you run iozone and other benchmark suites at 2x the ram. However, we can't do that due to 2TB 1.3TB. So, does anyone have any suggestions/experiences in benchmarking storage when the storage is smaller then 2x memory? Maybe this is a dumb question, but why do you care? If you have 1TB RAM and just a little more actual disk space, it seems like your database will always be cached in memory anyway. If you eliminate the cach effect, won't the benchmark actually give you the wrong real-life results? Craig -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Benchmarking a large server
On Mon, 9 May 2011, David Boreham wrote: On 5/9/2011 6:32 PM, Craig James wrote: Maybe this is a dumb question, but why do you care? If you have 1TB RAM and just a little more actual disk space, it seems like your database will always be cached in memory anyway. If you eliminate the cach effect, won't the benchmark actually give you the wrong real-life results? The time it takes to populate the cache from a cold start might be important. you may also have other processes that will be contending with the disk buffers for memory (for that matter, postgres may use a significant amount of that memory as it's producing it's results) David Lang Also, if it were me, I'd be wanting to check for weird performance behavior at this memory scale. I've seen cases in the past where the VM subsystem went bananas because the designers and testers of its algorithms never considered the physical memory size we deployed. How many times was the kernel tested with this much memory, for example ? (never??) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
Your OS won't *see* eight processors if you turn of HT. :-) I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 laptop. :-) There is no BIOS option to disable HT. I am a doctoral student (but married with kids, about 5-10 years over traditional doctorate student age) and am trying to speed up some of my data analysis with parallelism. Right now the current operation,if run in series, takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see something that maximizes CPU use, provided that it doesn't overwhelm I/O. Aren
Re: [PERFORM] Postgres refusing to use 1 core
On Mon, May 9, 2011 at 8:15 PM, Aren Cambre a...@arencambre.com wrote: Your OS won't *see* eight processors if you turn of HT. :-) I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 laptop. :-) There is no BIOS option to disable HT. I am a doctoral student (but married with kids, about 5-10 years over traditional doctorate student age) and am trying to speed up some of my data analysis with parallelism. Right now the current operation,if run in series, takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see something that maximizes CPU use, provided that it doesn't overwhelm I/O. The easiest way to use more cores is to just partition the data you want to work on into 4 or more chunks and launch that many multi-threaded processes at once. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
On Mon, May 9, 2011 at 10:15 PM, Aren Cambre a...@arencambre.com wrote: Your OS won't *see* eight processors if you turn of HT. :-) I'm going to pursue this digression just a little further, because it probably will be biting you sooner or later. We make sure to configure the BIOS on our database servers to turn off hyperthreading. It really can make a big difference in performance. OK, OK, I need to admit that this is a Core i7 720QM on an HP Envy 14 laptop. :-) There is no BIOS option to disable HT. I am a doctoral student (but married with kids, about 5-10 years over traditional doctorate student age) and am trying to speed up some of my data analysis with parallelism. Right now the current operation,if run in series, takes 30 hours and only stresses one of the 8 (fake) cores. I'd rather see something that maximizes CPU use, provided that it doesn't overwhelm I/O. Aren how are you reading through the table? if you are using OFFSET, you owe me a steak dinner. merlin -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
how are you reading through the table? if you are using OFFSET, you owe me a steak dinner. Nope. :-) Below is my exact code for the main thread. The C# PLINQ statement is highlighted. Let me know if I can help to explain this. NpgsqlConnection arrestsConnection = new NpgsqlConnection (Properties.Settings.Default.dbConnectionString); arrestsConnection.Open(); /// First clear out the geocoding table NpgsqlCommand geocodingTableClear = new NpgsqlCommand(TRUNCATE raw.\TxDPS geocoding\, arrestsConnection); geocodingTableClear.ExecuteNonQuery(); NpgsqlDataReader arrests = new NpgsqlCommand(SELECT * FROM \raw\.\TxDPS all arrests\, arrestsConnection).ExecuteReader(); /// Based on the pattern defined at /// http://social.msdn.microsoft.com/Forums/en-US/parallelextensions/thread/2f5ce226-c500-4899-a923-99285ace42ae . foreach(IDataRecord arrest in from row in arrests.AsParallel().Cast IDataRecord() select row) { Geocoder geocodeThis = new Geocoder(arrest); geocodeThis.Geocode(); } arrestsConnection.Close(); Aren
Re: [PERFORM] Benchmarking a large server
How many times was the kernel tested with this much memory, for example ? (never??) This is actually *extremely* relevant. Take a look at /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a newer one. On older systems for instance, those are set to 40 and 20 respectively (recent kernels cut these in half). That's significant because ratio is the *percentage* of memory that can remain dirty before causing async, and background_ratio tells it when it should start writing in the background to avoid hitting that higher and much more disruptive number. This is another source of IO that can be completely independent of the checkpoint spikes that long plagued PostgreSQL versions prior to 8.3. With that much memory (1TB!), that's over 100GB of dirty memory before it starts writing that out to disk even with the newer conservative settings. We had to tweak and test for days to find good settings for these, and our servers only have 96GB of RAM. You also have to consider, as fast as the FusionIO drives are, they're still NVRAM, which has write-amplification issues. How fast do you think it can commit 100GB of dirty memory to disk? Even with a background setting of 1%, that's 10GB on your system. That means you'd need to use a very new kernel so you can utilize the dirty_bytes and dirty_background_bytes settings so you can force those settings into more sane levels to avoid unpredictable several-minute long asyncs. I'm not sure how much testing Linux sees on massive hardware like that, but that's just one hidden danger of not properly benchmarking the server and just thinking 1TB of memory and caching the entire dataset is only an improvement. -- Shaun Thomas Peak6 | 141 W. Jackson Blvd. | Suite 800 | Chicago, IL 60604 312-676-8870 stho...@peak6.com __ See http://www.peak6.com/email_disclaimer.php for terms and conditions related to this email -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Postgres refusing to use 1 core
On 10/05/11 10:40, Aren Cambre wrote: how are you reading through the table? if you are using OFFSET, you owe me a steak dinner. Nope. :-) Below is my exact code for the main thread. The C# PLINQ statement is highlighted. Let me know if I can help to explain this. Looking at that code, I can't help but wonder why you're not doing it server side in batches. In general, it's really inefficient to use this pattern: rows = runquery(select * from something); for (item in rows) { // do something with item } Adding implicit parallelism within the loop won't help you much if client-side CPU use isn't your limitation. If each computation done on item is very expensive in client-side CPU this pattern makes sense, but otherwise should be avoided in favour of grabbing big chunks of rows and processing them all at once in batch SQL statements that let the database plan I/O efficiently. Even if you're going to rely on client-side looping - say, because of complex or library-based computation that must be done for each record - you must ensure that EACH THREAD HAS ITS OWN CONNECTION, whether that's a new connection established manually or one grabbed from an appropriate pool. Your code below shows no evidence of that at all; you're probably sharing one connection between all the threads, achieving no real parallelism whatsoever. Try limiting your parallel invocation to 4 threads (since that's number of cores you have) and making sure each has its own connection. In your case, that probably means having a new Geocoder instance grab a connection from a pool that contains at least 5 connections (one per Geocoder, plus the main connection). It also looks - though I don't know C# and npgsql so I can't be sure - like you're passing some kind of query result object to the Geocoder. Avoid that, because they might be using the connection to progressively read data behind the scenes in which case you might land up having locking issues, accidentally serializing your parallel work on the single main connection, etc. Instead, retrieve the contents of the IDataRecord (whatever that is) and pass that to the new Geocoder instance, so the new Geocoder has *absolutely* *no* *link* to the arrestsConnection and cannot possibly depend on it accidentally. Even better, use a server-side work queue implementation like pgq, and have each worker use its private connection to ask the server for the next record to process when it's done with the previous one, so you don't need a co-ordinating queue thread in your client side at all. You can also optionally make your client workers independent processes rather than threads that way, which simplifies debugging and resource management. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] partition query on multiple cores
I have 8-core server, I wanted to ask whether a query can be divided for multiple processors or cores, if it could be what to do in postgresql Thanks
Re: [PERFORM] Benchmarking a large server
On 05/09/2011 11:13 PM, Shaun Thomas wrote: Take a look at /proc/sys/vm/dirty_ratio and /proc/sys/vm/dirty_background_ratio if you have an older Linux system, or /proc/sys/vm/dirty_bytes, and /proc/sys/vm/dirty_background_bytes with a newer one. On older systems for instance, those are set to 40 and 20 respectively (recent kernels cut these in half). 1/4 actually; 10% and 5% starting in kernel 2.6.22. The main sources of this on otherwise new servers I see are RedHat Linux RHEL5 systems running 2.6.18. But as you say, even the lower defaults of the newer kernels can be way too much on a system with lots of RAM. The main downside I've seen of addressing this by using a kernel with dirty_bytes and dirty_background_bytes is that VACUUM can slow down considerably. It really relies on the filesystem having a lot of write cache to perform well. In many cases people are happy with VACUUM throttling if it means nasty I/O spikes go away, but the trade-offs here are still painful at times. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance