[PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-01 Thread Gary Doades
As part of my ongoing evaluation of PostgreSQL I have been doing a little stress 
testing. 
I though I would share an interesting result here..

Machine spec:
500 MHz PIII
256MB RAM
old-ish IDE HD (5400RPM)
Linux 2.4.22 kernel (Madrake 9.2)

I have PostgreSQL 7.4.1 installed and have managed to load up a 1.4 GB database 
from MS SQLServer. Vaccum analyzed it.

As a test in PosgreSQL I issued a statement to update a single column of a table 
containing 2.8 million rows with the values of a column in a table with similar 
rowcount. 
Using the above spec I had to stop the server after 17 hours. The poor thing was 
thrashing the hard disk and doing more swapping than useful work.

Having obtained a copy of Mandrake 10.0 with the 2.6 kernal I though I would give it a 
go. Same hardware. Same setup. Same database loaded up. Same postgresql.conf file 
to make sure all the settings were the same.  Vaccum analyzed it.

same update statement COMPLETED in 2 hours 50 minutes. I'm impressed.

I could see from vmstat that the system was achieving much greater IO thoughput than 
the 2.4 kernel. Although the system was still swapping there seems to be a completely 
different memory management pattern that suits PostgreSQL very well.

Just to see that this wasn't a coincidence I am repeating the test. It is now into the 
14th 
hour using the old 2.4 kernel. I'm going to give up.

Has anyone else done any comparative testing with the 2.6 kernel?

Cheers,
Gary.


---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] single index on more than two coulumns a bad thing?

2004-04-01 Thread Palle Girgensohn
Hi,

I remember reading a post ages ago, maybe from Vadim, about the fact that 
people creating indices on more than two columns will be the first to be 
put againts the wall when the revolution comes... sort of...

Is it always bad to create index xx on yy (field1, field2, field3);

I guess the problem is that the index might often grow bigger than the 
table, or at least big enough not to speed up the queries?

/Palle

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-01 Thread Josh Berkus
Palle,

 Is it always bad to create index xx on yy (field1, field2, field3);

No, it seldom bad, in fact.I have some indexes that run up to seven 
columns, becuase they are required for unique keys.

Indexes of 3-4 columns are often *required* for many-to-many join tables.

I'm afraid that you've been given some misleading advice.

 I guess the problem is that the index might often grow bigger than the 
 table, or at least big enough not to speed up the queries?

Well, yes ... a 4-column index on a 5-column table could be bigger than the 
table if allowed to bloat and not re-indexed.   But that's just a reason for 
better maintainence.

-- 
-Josh Berkus
 Aglio Database Solutions
 San Francisco


---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-01 Thread Tom Lane
Josh Berkus [EMAIL PROTECTED] writes:
 Is it always bad to create index xx on yy (field1, field2, field3);

 I'm afraid that you've been given some misleading advice.

I'd say it's a matter of getting your optimizations straight.

If you have a query that can make use of that index, and the query is
executed often enough to make it worth maintaining the index during
table updates, then by all means make the index.

The standard advice is meant to warn you against creating a zillion
indexes without any thought to what you'll be paying in update costs.
Indexes with more than a couple of columns are usually of only narrow
applicability, and so you have to be sure that they'll really pay for
themselves...

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Index Performance Help

2004-04-01 Thread Greg Stark

Damien Dougan [EMAIL PROTECTED] writes:

 Sample analyze output for an initial query:
 
 hydradb=# explain analyze select * from pvsubscriber where actorid =
 'b3432-asdas-232-Subscriber793500';

I take it pvsubscriber is a view? What's the definition of your view?

 -  Index Scan using mc_actor_key on mc_actor
(cost=0.00..4.08 rows=1 width=69)
(actual time=39.497..39.499 rows=1 loops=1)

Is this table regularly vacuumed? Is it possible it has lots of dead records
with this value for actorid? Try running vacuum full, or better vacuum full
verbose and keep the output, it might explain.

What version of postgres is this? You might try reindexing all your indexes
(but particularly this one). Older versions of postgres were prone to index
bloat problems.


-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] single index on more than two coulumns a bad thing?

2004-04-01 Thread Greg Stark

Palle Girgensohn [EMAIL PROTECTED] writes:

 Is it always bad to create index xx on yy (field1, field2, field3);

All generalisations are false...

Seriously, it's true that as the length of your index key gets longer the
harder and harder it is to justify it. That doesn't mean they're always wrong,
but you should consider whether a shorter key would perform just as well.

The other problem with long index keys is that they often show up in the same
place as having dozens of indexes on the same table. Usually in shops where
the indexes were created after the fact looking at specific queries.

-- 
greg


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-01 Thread Tom Lane
Gary Doades [EMAIL PROTECTED] writes:
 As a test in PosgreSQL I issued a statement to update a single column
 of a table containing 2.8 million rows with the values of a column in
 a table with similar rowcount.  Using the above spec I had to stop the
 server after 17 hours. The poor thing was thrashing the hard disk and
 doing more swapping than useful work.

This statement is pretty much content-free, since you did not show us
the table schemas, the query, or the EXPLAIN output for the query.
(I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily
have provided all the other hard facts.)  There's really no way to tell
where the bottleneck is.  Maybe it's a kernel-level issue, but I would
not bet on that without more evidence.  I'd definitely not bet on it
without direct confirmation that the same query plan was used in both
setups.

regards, tom lane

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] PostgreSQL and Linux 2.6 kernel.

2004-04-01 Thread Gary Doades
The post was not intended to be content-rich, just my initial feedback 
after only just switching to 2.6. Since I had largely given up on this 
particular line of attack using 2.4 I didn't think to do a detailed analysis 
at this time. I was also hoping that others would add to the discussion. 

As this could become important I will be doing more analysis, but due to 
the nature of the issue and trying to keep as many factors constant as 
possible, this may take some time.

Cheers,
Gary.

On 2 Apr 2004 at 1:32, Tom Lane wrote:

 Gary Doades [EMAIL PROTECTED] writes:
  As a test in PosgreSQL I issued a statement to update a single column
  of a table containing 2.8 million rows with the values of a column in
  a table with similar rowcount.  Using the above spec I had to stop the
  server after 17 hours. The poor thing was thrashing the hard disk and
  doing more swapping than useful work.
 
 This statement is pretty much content-free, since you did not show us
 the table schemas, the query, or the EXPLAIN output for the query.
 (I'll forgive you the lack of EXPLAIN ANALYZE, but you could easily
 have provided all the other hard facts.)  There's really no way to tell
 where the bottleneck is.  Maybe it's a kernel-level issue, but I would
 not bet on that without more evidence.  I'd definitely not bet on it
 without direct confirmation that the same query plan was used in both
 setups.
 
   regards, tom lane
 
 
 -- 
 Incoming mail is certified Virus Free.
 Checked by AVG Anti-Virus (http://www.grisoft.com).
 Version: 7.0.230 / Virus Database: 262.6.5 - Release Date: 31/03/2004
 



---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match