Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Thu, Apr 28, 2011 at 7:23 AM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap sok...@gmail.com wrote: Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. Maybe, but what is *best* for you is to tune PostgreSQL so that your costs are accurately modeled, at which point it will automatically pick the best plan for most or all of your queries without you needing to worry about it. If you set your effective_cache_size to the sum of shared_buffers and what your OS reports as cache after you've been running a while, that will help the optimizer know what size index fits in RAM, and will tend to encourage index use. If the active portion of your data is heavily cached, you might want to set random_page_cost and seq_page_cost to the same value, and make that value somewhere in the 0.1 to 0.05 range. If you have moderate caching, using 1 and 2 can be good. If you're still not getting reasonable plans, please post again with more information about your hardware along with the query and its EXPLAIN ANALYZE output. -Kevin I understand the need to tune PostgreSQL properly for my use case. What I am curious about is, for the data set I have, under what circumstances (hardware/workload/cache status/etc) would a sequential scan really be faster than an index scan for that particular query? To simulate a scenario when nothing is cached, I stopped PostgreSQL, dropped all system cache (sync; echo 3 /proc/sys/vm/drop_caches), restarted PostgreSQL, and ran the query. A sequential scan run took 13.70 seconds, while an index scan run took 0.34 seconds, which is still 40 times faster. Also, I tried increasing effective_cache_size from 512MB to 3GB (the database size is 2+GB), and it still favor sequential scan. The estimated costs did not change at all. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?
Dear all When database files are on a VxFS filesystem, performance can be significantly improved by setting the VX_CONCURRENT cache advisory on the file according to vxfs document, my question is that have any tested by this? #include sys/fs/vx_ioctl.h ioctl(fd, VX_SETCACHE, VX_CONCURRENT); Regards HSIEN WEN -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Wed, Apr 27, 2011 at 8:40 PM, Kevin Grittner kevin.gritt...@wicourts.gov wrote: Sok Ann Yap wrote: Kevin Grittner wrote: Please show us your overall configuration and give a description of the hardware (how many of what kind of cores, how much RAM, what sort of storage system). Here's the configuration (this is just a low end laptop): version | PostgreSQL 9.0.4 on x86_64-pc-linux-gnu, compiled by GCC x86_64-pc-linux-gnu-gcc (Gentoo 4.5.2 p1.0, pie-0.4.5) 4.5.2, 64-bit checkpoint_segments | 16 default_statistics_target | 1 Usually overkill. If this didn't help, you should probably change it back. effective_cache_size | 512MB lc_collate | en_US.UTF-8 lc_ctype | en_US.UTF-8 listen_addresses | * log_destination | syslog log_min_duration_statement | 0 maintenance_work_mem | 256MB max_connections | 100 You probably don't need this many connections. max_stack_depth | 2MB port | 5432 random_page_cost | 4 server_encoding | UTF8 shared_buffers | 256MB silent_mode | on TimeZone | Asia/Kuala_Lumpur wal_buffers | 1MB work_mem | 32MB (20 rows) It's hard to recommend other changes without knowing the RAM on the system. How many of what kind of CPUs would help, too. The thing is, the query I posted was fairly simple (I think), and PostgreSQL should be able to choose the 3000+ times faster index scan with the default random_page_cost of 4. It picks the plan with the lowest estimated cost. If it's not picking the best plan, that's usually an indication that you need to adjust cost factors so that estimates better model the actual costs. If I need to reduce it to 2 when using a 5.4k rpm slow disk, what is random_page_cost = 4 good for? It's good for large databases with a lot of physical disk I/O. In fact, in some of those cases, it needs to be higher. In your test, the numbers indicate that everything was cached in RAM. That makes the effective cost very low. Also, the odds are that you have more total cache space between the shared_buffers and the OS cache than the effective_cache_size setting, so the optimizer doesn't expect the number of cache hits you're getting on index usage. -Kevin Thanks for the tips and explanation. I wrongly assumed the random_page_cost value is independent from caching. Now, let's go back to the original query: SELECT salutations.id, salutations.name, EXISTS ( SELECT 1 FROM contacts WHERE salutations.id = contacts.salutation_id ) AS in_use FROM salutations If I split up the query, i.e. running this once: SELECT salutations.id, salutations.name FROM salutations and then running this 44 times, once for each row: SELECT EXISTS ( SELECT 1 FROM contacts WHERE contacts.salutation_id = ? ) AS in_use I can see that PostgreSQL will smartly pick the best plan, i.e. for common salutations (Madam, Ms, etc), it will do sequential scan, while for salutations that are rarely used or not used at all, it will do index scan. Anyway, the overhead of spawning 44 extra queries means that it is still better off for me to stick with the original query and tune PostgreSQL to choose index scan. -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Performance
Just want to share the DBT(25) thing http://archives.postgresql.org/pgsql-performance/2011-04/msg00145.php http://sourceforge.net/mailarchive/forum.php?forum_name=osdldbt-generalmax_rows=25style=nestedviewmonth=201104 On Wed, Apr 27, 2011 at 11:55 PM, Greg Smith g...@2ndquadrant.com wrote: Tomas Vondra wrote: Hmmm, just wondering - what would be needed to build such 'workload library'? Building it from scratch is not feasible IMHO, but I guess people could provide their own scripts (as simple as 'set up a a bunch of tables, fill it with data, run some queries') and there's a pile of such examples in the pgsql-performance list. The easiest place to start is by re-using the work already done by the TPC for benchmarking commercial databases. There are ports of the TPC workloads to PostgreSQL available in the DBT-2, DBT-3, and DBT-5 tests; see http://wiki.postgresql.org/wiki/Category:Benchmarking for initial information on those (the page on TPC-H is quite relevant too). I'd like to see all three of those DBT tests running regularly, as well as two tests it's possible to simulate with pgbench or sysbench: an in-cache read-only test, and a write as fast as possible test. The main problem with re-using posts from this list for workload testing is getting an appropriately sized data set for them that stays relevant. The nature of this sort of benchmark always includes some notion of the size of the database, and you get different results based on how large things are relative to RAM and the database parameters. That said, some sort of systematic collection of hard queries would also be a very useful project for someone to take on. People show up regularly who want to play with the optimizer in some way. It's still possible to do that by targeting specific queries you want to accelerate, where it's obvious (or, more likely, hard but still straightforward) how to do better. But I don't think any of these proposed exercises adjusting the caching model or default optimizer parameters in the database is going anywhere without some sort of benchmarking framework for evaluating the results. And the TPC tests are a reasonable place to start. They're a good mixed set of queries, and improving results on those does turn into a real commercial benefit to PostgreSQL in the future too. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us PostgreSQL 9.0 High Performance: http://www.2ndQuadrant.com/books -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Order of tables
On 28.04.2011 12:20, Rishabh Kumar Jain wrote: How the tables must be ordered in the list of tables in from statement? There is no difference in performance, if that's what you mean. (If not, then pgsql-novice or pgsql-sql mailing list would've be more appropriate) -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] reducing random_page_cost from 4 to 2 to force index scan
On Wed, Apr 27, 2011 at 5:19 PM, Sok Ann Yap sok...@gmail.com wrote: I understand the need to tune PostgreSQL properly for my use case. What I am curious about is, for the data set I have, under what circumstances (hardware/workload/cache status/etc) would a sequential scan really be faster than an index scan for that particular query? The sequential scan on contacts can be terminated as soon as the first matching row is found. If each block of the contacts table contains one example of each salutation, then the inner sequential scan will always be very short, and faster than an index scan. I can engineer this to be the case by populating the table like this: insert into contacts select (generate_series%44+1)::int from generate_series (1,100); Here I get the seq scan being 2.6ms while the index scan is 5.6ms. Predicting how far the inner scan needs to go would be quite difficult, and I don't know how the system will do it. However, when I create and populate simple tables based on your description, I get the index scan being the lower estimated cost. So the tables I built are not sufficient to study the matter in detail. Cheers, Jeff -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] VX_CONCURRENT flag on vxfs( 5.1 or later) for performance for postgresql?
On 04/27/2011 11:33 PM, HSIEN-WEN CHU wrote: When database files are on a VxFS filesystem, performance can be significantly improved by setting the VX_CONCURRENT cache advisory on the file according to vxfs document, That won't improve performance, and it's not safe either. VX_CONCURRENT switches the filesystem to use direct I/O. That's usually slower for PostgreSQL. And it introduces some requirements for both block alignment and the application avoiding overlapping writes. PostgreSQL doesn't do either, so I wouldn't expect it to be compatible with VX_CONCURRENT. -- Greg Smith 2ndQuadrant USg...@2ndquadrant.com Baltimore, MD PostgreSQL Training, Services, and 24x7 Support www.2ndQuadrant.us -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance