[PERFORM] switchover between index and sequential scans
Hi. I have a table with 1.8M rows on a Postgres 8.1.4 server, and I'm executing a query which looks like: select count(*) from header_fields where message in (select message from mailbox_messages limit N); I've found that when N==75, the query uses a fast index scan, but when N==100, it switches to a seqscan instead. Here are the plans, first the fast query (which retrieves 1306 rows): explain analyse select count(*) from header_fields where message in (select message from mailbox_messages limit 75); Aggregate (cost=84873.57..84873.58 rows=1 width=0) (actual time=940.513..940.516 rows=1 loops=1) - Nested Loop (cost=2.25..84812.59 rows=24391 width=0) (actual time=53.235..935.743 rows=1306 loops=1) - HashAggregate (cost=2.25..3.00 rows=75 width=4) (actual time=1.351..1.969 rows=75 loops=1) - Limit (cost=0.00..1.31 rows=75 width=4) (actual time=0.096..0.929 rows=75 loops=1) - Seq Scan on mailbox_messages (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.087..0.513 rows=75 loops=1) - Index Scan using header_fields_message_key on header_fields (cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 loops=75) Index Cond: (header_fields.message = outer.message) Total runtime: 942.535 ms And the slow query (which fetches 1834 rows): explain analyse select count(*) from header_fields where message in (select message from mailbox_messages limit 100); Aggregate (cost=95175.20..95175.21 rows=1 width=0) (actual time=36670.432..36670.435 rows=1 loops=1) - Hash IN Join (cost=3.00..95093.89 rows=32522 width=0) (actual time=27.620..36662.768 rows=1834 loops=1) Hash Cond: (outer.message = inner.message) - Seq Scan on header_fields (cost=0.00..85706.78 rows=1811778 width=4) (actual time=22.505..29281.553 rows=1812184 loops=1) - Hash (cost=2.75..2.75 rows=100 width=4) (actual time=1.708..1.708 rows=100 loops=1) - Limit (cost=0.00..1.75 rows=100 width=4) (actual time=0.033..1.182 rows=100 loops=1) - Seq Scan on mailbox_messages (cost=0.00..1912.10 rows=109410 width=4) (actual time=0.023..0.633 rows=100 loops=1) Total runtime: 36670.732 ms (If I set enable_seqscan=off, just to see what happens, then it uses the first plan, and executes much faster.) I'd like to understand why this happens, although the problem doesn't seem to exist with 8.3. The number of rows retrieved in each case is a tiny fraction of the table size, so what causes the decision to change between 75 and 100? This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048. Changing the last two doesn't seem to have any effect on the plan. Thanks. -- ams -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] [QUESTION]Concurrent Access
I want to know if the PostGree has limitations about the concurrent access, because a lot of people will access this database at the same time. PostgreSQL has excellent concurrency provided you use it correctly. But what do you mean by concurrent access ? * Number of opened Postgres connections at the same time ? = each one of those uses a little bit of RAM. (see manual) but if idle they don't use CPU. * Number of opened transactions at the same time ? (between BEGIN and COMMIT) If your transactions are long and you have many transactions at the same time you can get lock problems, for instance transaction A updates row X and transaction B updates the same row X, one will have to wait for the other to commit or rollback of course. If your transactions last 1 ms there is no problem, if they last 5 minutes you will suffer. * Number of queries executing at the same time ? This is different from above, each query will eat some CPU and IO resources, and memory too. * Number of concurrent HTTP connections to your website ? If you have a website, you will probably use some form of connection pooling, or lighttpd/fastcgi, or a proxy, whatever, so the number of open database connections at the same time won't be that high. Unless you use mod_php without connection pooling, in that case it will suck of course, but that's normal. * Number of people using your client ? See number of idle connections above. Or use connection pool. I want to know about the limitations, like how much memory do i have to use That depends on what you want to do ;) How big could be my database ? That depends on what you do with it ;) Working set size is more relevant than total database size. For instance if your database contains orders from the last 10 years, but only current orders (say orders from this month) are accessed all the time, with old orders being rarely accessed, you want the last 1-2 months' worth of orders to fit in RAM for fast access (caching) but you don't need RAM to fit your entire database. So, think about working sets not total sizes. And there is no limit on the table size (well, there is, but you'll never hit it). People have terabytes in postgres and it seems to work ;) -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
[PERFORM] slow delete
I have a table with 29K rows total and I need to delete about 80K out of it. I have a b-tree index on column cola (varchar(255) ) for my where clause to use. my select count(*) from test where cola = 'abc' runs very fast, but my actual delete from test where cola = 'abc'; takes forever, never can finish and I haven't figured why In my explain output, what is that Bitmap Heap Scan on table? is it a table scan? is my index being used? How does delete work? to delete 80K rows that meet my condition, does Postgres find them all and delete them all together or one at a time? by the way, there is a foreign key on another table that references the primary key col0 on table test. Could some one help me out here? Thanks a lot, Jessica testdb=# select count(*) from test; count 295793 --total 295,793 rows (1 row) Time: 155.079 ms testdb=# select count(*) from test where cola = 'abc'; count --- 80998 - need to delete 80,988 rows (1 row) testdb=# explain delete from test where cola = 'abc'; QUERY PLAN Bitmap Heap Scan on test (cost=2110.49..10491.57 rows=79766 width=6) Recheck Cond: ((cola)::text = 'abc'::text) - Bitmap Index Scan on test_cola_idx (cost=0.00..2090.55 rows=79766 width=0) Index Cond: ((cola)::text = 'abc'::text) (4 rows)
[PERFORM] Define all IP's in the world in pg_hba.conf
Hi everybody, I know that this group deals with performance but is the only one on which I'm subscribed, so my apologize in advance for the question. I want to allow everybody in the world, all IP's, to connect to my server. How do I accomplish that? Definitely, it's not a good solution to enter all them manually in pg_hba.conf :). Currently, if above question cannot be answered, I want to achieve to allow the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future it can expand to all IP's. Thank you, Danny -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] slow delete
Jessica Richard wrote: I have a table with 29K rows total and I need to delete about 80K out of it. I assume you meant 290K or something. I have a b-tree index on column cola (varchar(255) ) for my where clause to use. my select count(*) from test where cola = 'abc' runs very fast, but my actual delete from test where cola = 'abc'; takes forever, never can finish and I haven't figured why When you delete, the database server must: - Check all foreign keys referencing the data being deleted - Update all indexes on the data being deleted - and actually flag the tuples as deleted by your transaction All of which takes time. It's a much slower operation than a query that just has to find out how many tuples match the search criteria like your SELECT does. How many indexes do you have on the table you're deleting from? How many foreign key constraints are there to the table you're deleting from? If you find that it just takes too long, you could drop the indexes and foreign key constraints, do the delete, then recreate the indexes and foreign key constraints. This can sometimes be faster, depending on just what proportion of the table must be deleted. Additionally, remember to VACUUM ANALYZE the table after that sort of big change. AFAIK you shouldn't really have to if autovacuum is doing its job, but it's not a bad idea anyway. -- Craig Ringer -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Define all IP's in the world in pg_hba.conf
idc danny wrote: Hi everybody, I know that this group deals with performance but is the only one on which I'm subscribed, so my apologize in advance for the question. I want to allow everybody in the world, all IP's, to connect to my server. How do I accomplish that? Definitely, it's not a good solution to enter all them manually in pg_hba.conf :). what's wrong with 0.0.0.0/0 ? Currently, if above question cannot be answered, I want to achieve to allow the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future it can expand to all IP's. Thank you, Danny -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance