[PERFORM] switchover between index and sequential scans

2008-07-03 Thread Abhijit Menon-Sen
Hi.

I have a table with 1.8M rows on a Postgres 8.1.4 server, and I'm
executing a query which looks like:

select count(*) from header_fields where message in
(select message from mailbox_messages limit N);

I've found that when N==75, the query uses a fast index scan, but when
N==100, it switches to a seqscan instead. Here are the plans, first the
fast query (which retrieves 1306 rows):

 explain analyse select count(*) from header_fields where message in (select 
 message from mailbox_messages limit 75);

 Aggregate  (cost=84873.57..84873.58 rows=1 width=0) (actual 
time=940.513..940.516 rows=1 loops=1)
   -  Nested Loop  (cost=2.25..84812.59 rows=24391 width=0) (actual 
time=53.235..935.743 rows=1306 loops=1)
 -  HashAggregate  (cost=2.25..3.00 rows=75 width=4) (actual 
time=1.351..1.969 rows=75 loops=1)
   -  Limit  (cost=0.00..1.31 rows=75 width=4) (actual 
time=0.096..0.929 rows=75 loops=1)
 -  Seq Scan on mailbox_messages  (cost=0.00..1912.10 
rows=109410 width=4) (actual time=0.087..0.513 rows=75 loops=1)
 -  Index Scan using header_fields_message_key on header_fields  
(cost=0.00..1126.73 rows=325 width=4) (actual time=9.003..12.330 rows=17 
loops=75)
   Index Cond: (header_fields.message = outer.message)
 Total runtime: 942.535 ms

And the slow query (which fetches 1834 rows):

 explain analyse select count(*) from header_fields where message in (select 
 message from mailbox_messages limit 100);

 Aggregate  (cost=95175.20..95175.21 rows=1 width=0) (actual 
time=36670.432..36670.435 rows=1 loops=1)
   -  Hash IN Join  (cost=3.00..95093.89 rows=32522 width=0) (actual 
time=27.620..36662.768 rows=1834 loops=1)
 Hash Cond: (outer.message = inner.message)
 -  Seq Scan on header_fields  (cost=0.00..85706.78 rows=1811778 
width=4) (actual time=22.505..29281.553 rows=1812184 loops=1)
 -  Hash  (cost=2.75..2.75 rows=100 width=4) (actual time=1.708..1.708 
rows=100 loops=1)
   -  Limit  (cost=0.00..1.75 rows=100 width=4) (actual 
time=0.033..1.182 rows=100 loops=1)
 -  Seq Scan on mailbox_messages  (cost=0.00..1912.10 
rows=109410 width=4) (actual time=0.023..0.633 rows=100 loops=1)
 Total runtime: 36670.732 ms

(If I set enable_seqscan=off, just to see what happens, then it uses the
first plan, and executes much faster.)

I'd like to understand why this happens, although the problem doesn't
seem to exist with 8.3. The number of rows retrieved in each case is a
tiny fraction of the table size, so what causes the decision to change
between 75 and 100?

This machine has only 512MB of RAM, and is running FreeBSD 5.4. It has
shared_buffers=3072, effective_cache_size=25000, work_mem=sort_mem=2048.
Changing the last two doesn't seem to have any effect on the plan.

Thanks.

-- ams

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] [QUESTION]Concurrent Access

2008-07-03 Thread PFC


I want to know if the PostGree has limitations about the concurrent  
access,

because a lot of people will access this database at the same time.


PostgreSQL has excellent concurrency provided you use it correctly.

But what do you mean by concurrent access ?

* Number of opened Postgres connections at the same time ?
	= each one of those uses a little bit of RAM. (see manual) but if idle  
they don't use CPU.


* Number of opened transactions at the same time ?
(between BEGIN and COMMIT)
	If your transactions are long and you have many transactions at the same  
time you can get lock problems, for instance transaction A updates row X  
and transaction B updates the same row X, one will have to wait for the  
other to commit or rollback of course. If your transactions last 1 ms  
there is no problem, if they last 5 minutes you will suffer.


* Number of queries executing at the same time ?
	This is different from above, each query will eat some CPU and IO  
resources, and memory too.


* Number of concurrent HTTP connections to your website ?
	If you have a website, you will probably use some form of connection  
pooling, or lighttpd/fastcgi, or a proxy, whatever, so the number of open  
database connections at the same time won't be that high. Unless you use  
mod_php without connection pooling, in that case it will suck of course,  
but that's normal.


* Number of people using your client ?
See number of idle connections above. Or use connection pool.

I want to know about the limitations, like how much memory do i have to  
use


That depends on what you want to do ;)


How big could be my database ?


That depends on what you do with it ;)

Working set size is more relevant than total database size.

	For instance if your database contains orders from the last 10 years, but  
only current orders (say orders from this month) are accessed all the  
time, with old orders being rarely accessed, you want the last 1-2 months'  
worth of orders to fit in RAM for fast access (caching) but you don't need  
RAM to fit your entire database.

So, think about working sets not total sizes.

	And there is no limit on the table size (well, there is, but you'll never  
hit it). People have terabytes in postgres and it seems to work ;)


--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] slow delete

2008-07-03 Thread Jessica Richard
I have a table with 29K rows total and I need to delete about 80K out of it.

I have a b-tree index on column cola (varchar(255) ) for my where clause to use.

my select count(*) from test where cola = 'abc' runs very fast,
 
but my actual delete from test where cola = 'abc'; takes forever, never can 
finish and I haven't figured why

In my explain output, what is that Bitmap Heap Scan on table? is it a table 
scan? is my index being used?

How does delete work? to delete 80K rows that meet my condition, does Postgres 
find them all and delete them all together or one at a time?


by the way, there is a foreign key on another table that references the primary 
key col0 on table test.

Could some one help me out here?

Thanks a lot,
Jessica


testdb=# select count(*) from test;
 count  

 295793  --total 295,793 rows
(1 row)

Time: 155.079 ms

testdb=# select count(*) from test where cola = 'abc';
 count 
---
 80998  - need to delete 80,988 rows
(1 row)



testdb=# explain delete from test where cola = 'abc';
 QUERY PLAN 


 Bitmap Heap Scan on test  (cost=2110.49..10491.57 rows=79766 width=6)
   Recheck Cond: ((cola)::text = 'abc'::text)
   -  Bitmap Index Scan on test_cola_idx  (cost=0.00..2090.55 rows=79766 
width=0)
 Index Cond: ((cola)::text = 'abc'::text)
(4 rows)



  

[PERFORM] Define all IP's in the world in pg_hba.conf

2008-07-03 Thread idc danny
Hi everybody,

I know that this group deals with performance but is the only one on which I'm 
subscribed, so my apologize in advance for the question.

I want to allow everybody in the world, all IP's, to connect to my server. How 
do I accomplish that? Definitely, it's not a good solution to enter all them 
manually in pg_hba.conf :).

Currently, if above question cannot be answered, I want to achieve to allow the 
IP's of Hamachi network, which all are of the form 5.*.*.* - but in the future 
it can expand to all IP's.

Thank you,
Danny


 


  

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] slow delete

2008-07-03 Thread Craig Ringer

Jessica Richard wrote:

I have a table with 29K rows total and I need to delete about 80K out of it.


I assume you meant 290K or something.

I have a b-tree index on column cola (varchar(255) ) for my where clause 
to use.


my select count(*) from test where cola = 'abc' runs very fast,
 
but my actual delete from test where cola = 'abc'; takes forever, 
never can finish and I haven't figured why


When you delete, the database server must:

- Check all foreign keys referencing the data being deleted
- Update all indexes on the data being deleted
- and actually flag the tuples as deleted by your transaction

All of which takes time. It's a much slower operation than a query that 
just has to find out how many tuples match the search criteria like your 
SELECT does.


How many indexes do you have on the table you're deleting from? How many 
foreign key constraints are there to the table you're deleting from?


If you find that it just takes too long, you could drop the indexes and 
foreign key constraints, do the delete, then recreate the indexes and 
foreign key constraints. This can sometimes be faster, depending on just 
what proportion of the table must be deleted.


Additionally, remember to VACUUM ANALYZE the table after that sort of 
big change. AFAIK you shouldn't really have to if autovacuum is doing 
its job, but it's not a bad idea anyway.


--
Craig Ringer

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Define all IP's in the world in pg_hba.conf

2008-07-03 Thread Russell Smith
idc danny wrote:
 Hi everybody,

 I know that this group deals with performance but is the only one on which 
 I'm subscribed, so my apologize in advance for the question.

 I want to allow everybody in the world, all IP's, to connect to my server. 
 How do I accomplish that? Definitely, it's not a good solution to enter all 
 them manually in pg_hba.conf :).
   
what's wrong with 0.0.0.0/0 ?
 Currently, if above question cannot be answered, I want to achieve to allow 
 the IP's of Hamachi network, which all are of the form 5.*.*.* - but in the 
 future it can expand to all IP's.

 Thank you,
 Danny


  


   

   


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance