Re: [PERFORM] PostgreSQL scalability on Sun UltraSparc T1

2006-07-31 Thread Jignesh Shah

Hi Arjen,

I am curious about your Sun Studio compiler options also.
Can you send that too ?

Any other tweakings that you did on Solaris?

Thanks.

Regards,
Jignesh


Arjen van der Meijden wrote:

On 29-7-2006 19:01, Joshua D. Drake wrote:
Well I would be curious about the postgresql.conf and how much ram 
etc... it had.


It was the 8core version with 16GB memory... but actually that's just 
overkill, the active portions of the database easily fits in 8GB and a 
test on another machine with just 2GB didn't even show that much 
improvements when going to 7GB (6x1G, 2x 512M), it was mostly in the 
range of 10% improvement or less.


Anyway, the differences to the default postgresql.conf:
shared_buffers = 3
Tests with 40k, 50k en 60k didn't really show improvements.

work_mem = 2048
This probably could've been set higher with the sheer amount of 
not-really-used memory.


maintenance_work_mem = 65535
Not really important of course

max_fsm_pages = 5
Somehow it needed to be set quite high, probably because we only 
cleaned up after doing over 200k requests.


effective_cache_size = 35
As said, the database fitted in 8GB of memory, so I didn't see a need 
to set this higher than for the 8GB machines (x4200 and another T2000 
we had).


default_statistics_target = 200
For a few columns on the largest tables I manually raised it to 1000

log_min_duration_statement = 1000
I'm not sure if this has much overhead? Stats logging was turned/left 
on as well.

Turning that off improved it a few percent.

I understand, I just have a feeling that we could do even better :) I 
do appreciate all your efforts.


Well, I'll keep that in mind :)
What it makes even worse for MySQL is that it had (on another machine) 
about 8M hits on the query cache for 4M inserts, i.e. half of the 
queries weren't even executed on it.


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] PostgreSQL scalability on Sun UltraSparc T1

2006-07-31 Thread Arjen van der Meijden

Hi Jignesh,

It was a cvs-checkout of 8.2 devel, compiled using:
CPPFLAGS=-fast -xtarget=ultraT1 -xnolibmopt CC=/opt/SUNWspro/bin/cc 
./configure --without-readline


We'd gotten a specially adjusted Solaris version from Sun Holland for 
the T2000. It was a dvd with a Solaris flar archive from 11 april 2006 
and patches from 25 april 2006. It also had the preferred Solaris System 
settings already applied. If you need more details about that dvd, I 
think your best option is to contact Hans Nijbacker or Bart Muijzer, 
since we're no Solaris-experts :)


Appart from that, we did no extra tuning of the OS, nor did Hans for the 
MySQL-optimizations (afaik, but then again, he knows best).


Best regards,

Arjen van der Meijden

Jignesh Shah wrote:

Hi Arjen,

I am curious about your Sun Studio compiler options also.
Can you send that too ?

Any other tweakings that you did on Solaris?

Thanks.

Regards,
Jignesh


Arjen van der Meijden wrote:

On 29-7-2006 19:01, Joshua D. Drake wrote:
Well I would be curious about the postgresql.conf and how much ram 
etc... it had.


It was the 8core version with 16GB memory... but actually that's just 
overkill, the active portions of the database easily fits in 8GB and a 
test on another machine with just 2GB didn't even show that much 
improvements when going to 7GB (6x1G, 2x 512M), it was mostly in the 
range of 10% improvement or less.


Anyway, the differences to the default postgresql.conf:
shared_buffers = 3
Tests with 40k, 50k en 60k didn't really show improvements.

work_mem = 2048
This probably could've been set higher with the sheer amount of 
not-really-used memory.


maintenance_work_mem = 65535
Not really important of course

max_fsm_pages = 5
Somehow it needed to be set quite high, probably because we only 
cleaned up after doing over 200k requests.


effective_cache_size = 35
As said, the database fitted in 8GB of memory, so I didn't see a need 
to set this higher than for the 8GB machines (x4200 and another T2000 
we had).


default_statistics_target = 200
For a few columns on the largest tables I manually raised it to 1000

log_min_duration_statement = 1000
I'm not sure if this has much overhead? Stats logging was turned/left 
on as well.

Turning that off improved it a few percent.

I understand, I just have a feeling that we could do even better :) I 
do appreciate all your efforts.


Well, I'll keep that in mind :)
What it makes even worse for MySQL is that it had (on another machine) 
about 8M hits on the query cache for 4M inserts, i.e. half of the 
queries weren't even executed on it.


Best regards,

Arjen

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


---(end of broadcast)---
TIP 6: explain analyze is your friend



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread Richard Huxton

H Hale wrote:
I am testing a query what that has a sub-select. The query performance is very very poor as shown below due to the use of sequencial scans.  The actual row count of both tables is also shown. It appears the row count shown by explain analyze does not match the actual count. Columns dstobj, srcobj  objectid are all indexed yet postgres insists on using seq scans. Vacuum analyze makes no difference. I am using 8.1.3 on linux. 
 
 This is a very simple query with relatively small amount of data and  the query is taking 101482 ms.  Queries with sub-selects on both tables individually is very fast  (8 ms). 
 
 How do I prevent the use of seq scans?  


Hmm - something strange here.


 capsa=# explain analyze select name from capsa.flatomfilesysentry where 
objectid in ( select dstobj from capsa.flatommemberrelation where srcobj = 
'c1c7304a-1fe1-11db-8af7-001143214409');
 
 
QUERY PLAN

 
-
  Nested Loop IN Join  (cost=0.00..1386.45 rows=5809 width=14) (actual 
time=2.933..101467.463 rows=5841 loops=1)
Join Filter: (outer.objectid = inner.dstobj)
-  Seq Scan on flatomfilesysentry  (cost=0.00..368.09 rows=5809 width=30) 
(actual time=0.007..23.451 rows=5844 loops=1)
-  Seq Scan on flatommemberrelation  (cost=0.00..439.05 rows=5842 
width=16) (actual time=0.007..11.790 rows=2922 loops=5844)
  Filter: (srcobj = 
'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid)
  Total runtime: 101482.256 ms


Look at that second seq-scan (on flatommemberrelation) - it's looping 
5844 times (once for each row in flatmfilesysentry). I'd expect PG to 
materialise the seq-scan once and then join (unless I'm missing 
something, the subselect just involves the one test against a constant).


I'm guessing something in your configuration is pushing your cost 
estimates far away from reality. Could you try issuing a set 
enable_seqscan=off and then running explain-analyse again. That will 
show us alternatives.


Also, what performance-related configuration values have you changed? 
Could you post them with a brief description of your hardware?


--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] How to increase performance?

2006-07-31 Thread Richard Huxton

Hristo Markov wrote:

Hello,
   
  My name is Hristo Markov. I am software developer. 
  I am developing software systems (with C/C++ program language) that work on Windows operation system and uses ODBC driver and ACCESS database. I want to change database with PostgreSQL.
  The systems working without problems with PostgreSQL and ODBC, but the performance and speed of updating and reading of data are very low. 
  I run the test program working on one single computer under Windows XP operating system and working with equal data (I use around 10 tables at the same time). The difference is only databases and ODBC drivers.
 
  The results from speed and performance of the test program are:

Around 10 seconds under Access database.
Around 40 seconds under PostgreSQL database.
  
Please help me to increase speed and performance of PostgreSQL. 
  /I am freshman in PostgreSQL and I thing that may be must set some settings /


Are there specific queries you're having problems with? How many 
transactions does this 40 seconds represent? What is the bottle-neck - 
CPU/disk/memory?


You might find this a good place to start reading about configuration 
settings, and then follow that with the manuals.

http://www.powerpostgresql.com/PerfList

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query 200x slower on server [PART 2]

2006-07-31 Thread Richard Huxton

NbForYou wrote:

See Query 200x slower on server [PART 1] before reading any further


Cant' find it. Sorry.



QUERY PLAN ON MY HOME SERVER

[snip]

Total runtime: 125.432 ms

This means that the Query is 200 times slower on the webhost!

How can I resolve this?


First - what is different between the two plans and why? PostgreSQL will 
be choosing a different plan because:

1. It's estimating different numbers of rows for one or more steps
2. It's estimating a different cost for one or more steps
3. It's triggering the genetic optimiser which means you're not 
necessarily going to get the same plan each time.

4. You've got different versions of PG on the different machines.

--
  Richard Huxton
  Archonet Ltd

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


[PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau

Hi group,

this is a directory tree query for a backup system (http:// 
sourceforge.net/projects/bacula).
You provide a path and get back the names of the children plus a  
boolean telling if the child has itself children.

The %@ stands for the initial path:
---
explain analyze  SELECT X.name AS name, COUNT(CH)  1 AS children
  FROM
( SELECT  RTRIM( REPLACE( NLPC.path, '%@/', ''),'/') AS name,
 FN.name AS CH
FROM
  ( SELECT P.path,P.pathid
  FROM bacula.path P
  WHERE P.path ~ '^%@/[^/]*/$' ) AS NLPC
  LEFT OUTER JOIN bacula.file F
ON
  NLPC.pathid = F.pathid
  LEFT OUTER JOIN bacula.filename FN
ON
  F.filenameid = FN.filenameid
GROUP BY NLPC.path, FN.name
  UNION
  SELECT  FN.name AS name, FN.name AS CH
FROM
  bacula.path P, bacula.file F, bacula.filename FN
WHERE
  P.path = '%@/'   AND
  P.pathid = F.pathid   AND
  F.filenameid = FN.filenameid
) AS X
  WHERE X.name  ''
  GROUP BY X.name
---
The 1st part of the union takes care of directories, the 2nd one of  
flat files.
Application allows user navigation in a browser (clicking on a name  
in one column triggers the query and fills the next browser column).

Initial path of /Users/axel/Library/Preferences results in:
---
 Sort  (cost=1295.24..1295.47 rows=92 width=64) (actual  
time=818.987..819.871 rows=527 loops=1)

   Sort Key: upper(x.name)
   -  GroupAggregate  (cost=1288.56..1292.24 rows=92 width=64)  
(actual time=784.069..814.059 rows=527 loops=1)
 -  Unique  (cost=1288.56..1289.25 rows=92 width=112)  
(actual time=783.931..809.708 rows=684 loops=1)
   -  Sort  (cost=1288.56..1288.79 rows=92 width=112)  
(actual time=783.921..793.150 rows=5350 loops=1)

 Sort Key: name, ch
 -  Append  (cost=642.03..1285.55 rows=92  
width=112) (actual time=335.134..723.917 rows=5350 loops=1)
   -  Subquery Scan *SELECT*  
1  (cost=642.03..643.18 rows=46 width=112) (actual  
time=335.130..338.564 rows=184 loops=1)
 -  HashAggregate   
(cost=642.03..642.72 rows=46 width=112) (actual time=335.121..337.843  
rows=184 loops=1)
   -  Nested Loop Left Join   
(cost=2.00..641.80 rows=46 width=112) (actual time=39.293..326.831  
rows=1685 loops=1)
 -  Nested Loop Left  
Join  (cost=0.00..502.63 rows=46 width=97) (actual  
time=21.026..202.206 rows=1685 loops=1)
   -  Index Scan  
using path_name_idx on path p  (cost=0.00..3.02 rows=1 width=97)  
(actual time=15.480..56.935 rows=27 loops=1)
 Index Cond:  
((path = '/Users/axel/Library/Preferences/'::text) AND (path  '/ 
Users/axel/Library/Preferences0'::text))
 Filter:  
((path ~ '^/Users/axel/Library/Preferences/[^/]*/$'::text) AND (rtrim 
(replace(path, '/Users/axel/Library/Preferences/'::text, ''::text),  
'/'::text)  ''::text))
   -  Index Scan  
using file_path_idx on file f  (cost=0.00..493.28 rows=506 width=8)  
(actual time=0.473..5.119 rows=62 loops=27)
 Index Cond:  
(outer.pathid = f.pathid)
 -  Bitmap Heap Scan on  
filename fn  (cost=2.00..3.01 rows=1 width=23) (actual  
time=0.058..0.061 rows=1 loops=1685)
   Recheck Cond:  
(outer.filenameid = fn.filenameid)
   -  Bitmap Index  
Scan on filename_pkey  (cost=0.00..2.00 rows=1 width=0) (actual  
time=0.030..0.030 rows=1 loops=1685)
 Index Cond:  
(outer.filenameid = fn.filenameid)
   -  Nested Loop  (cost=2.00..641.91  
rows=46 width=19) (actual time=3.349..377.758 rows=5166 loops=1)
 -  Nested Loop  (cost=0.00..502.62  
rows=46 width=4) (actual time=3.118..97.375 rows=5200 loops=1)
   -  Index Scan using  
path_name_idx on path p  (cost=0.00..3.01 rows=1 width=4) (actual  
time=0.045..0.052 rows=1 loops=1)
 Index Cond: (path = '/ 
Users/axel/Library/Preferences/'::text)
   -  Index Scan using  
file_path_idx on file f  (cost=0.00..493.28 rows=506 width=8) (actual  
time=3.058..76.014 rows=5200 loops=1)
   

Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Michael Stone

On Mon, Jul 31, 2006 at 12:48:11PM +0200, Axel Rau wrote:

  WHERE P.path ~ '^%@/[^/]*/$' ) AS NLPC


This can't be indexed. You might try something like 
WHERE P.path LIKE '[EMAIL PROTECTED]' AND P.path ~ '^%@/[^/]*/$'


The schema could be a lot more intelligent here. (E.g., store path 
seperately from file/directory name, store type (file or directory) 
seperately, etc.) Without improving the schema I don't think this will 
ever be a speed demon.


Mike Stone

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 13:15 schrieb Michael Stone:


On Mon, Jul 31, 2006 at 12:48:11PM +0200, Axel Rau wrote:

  WHERE P.path ~ '^%@/[^/]*/$' ) AS NLPC


This can't be indexed. You might try something like WHERE P.path  
LIKE '[EMAIL PROTECTED]' AND P.path ~ '^%@/[^/]*/$'

Why does it quite well in this case:
---
-  Index Scan using path_name_idx on path p  (cost=0.00..3.02 rows=1  
width=97) (actual time=15.480..56.935 rows=27 loops=1)
  Index Cond: ((path = '/Users/axel/Library/ 
Preferences/'::text) AND (path  '/Users/axel/Library/ 
Preferences0'::text))
  Filter: ((path ~ '^/Users/axel/Library/Preferences/[^/]*/ 
$'::text) AND (rtrim(replace(path, '/Users/axel/Library/ 
Preferences/'::text, ''::text), '/'::text)  ''::text))

---
as compared to this case(ignoring the index on path):
---
-  Index Scan using path_pkey on path p  (cost=0.00..2567.57  
rows=1941 width=97) (actual time=527.805..1521.911 rows=69 loops=1)
  Filter: ((path ~ '^/Users/axel/[^/]*/$'::text) AND (rtrim 
(replace(path, '/Users/axel/'::text, ''::text), '/'::text)   
''::text))

---
? With all longer path names, I get the above (good)result.
Should I put the rtrim/replace on the client side?


The schema could be a lot more intelligent here. (E.g., store path  
seperately from file/directory name, store type (file or directory)  
seperately, etc.) Without improving the schema I don't think this  
will ever be a speed demon.
PATH holds complete pathnames of directories, FILENAME holds  
filenames and pathname components.
Currently the schema is the lowest common denominator between SQLite,  
MySQL and pg and the bacula people will stay with that (-;).

Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


[PERFORM] Partitioning / constrain exlusion not working with %-operator

2006-07-31 Thread Martin Lesser
I try to partition a large table (~ 120 mio. rows) into 50 smaller
tables but using the IMO immutable %-function constraint exclusion
does not work as expected:

CREATE TABLE tt_m (id1 int, cont varchar);
CREATE TABLE tt_0 (check (id1 % 50 = 0)) INHERITS (tt_m);
CREATE TABLE tt_1 (check (id1 % 50 = 1)) INHERITS (tt_m);

CREATE RULE ins_tt_0 AS ON INSERT TO tt_m WHERE id1 % 50 = 0 DO INSTEAD INSERT 
INTO tt_0 VALUES (new.*);
CREATE RULE ins_tt_1 AS ON INSERT TO tt_m WHERE id1 % 50 = 1 DO INSTEAD INSERT 
INTO tt_1 VALUES (new.*);
...
INSERT INTO tt_m (id1,cont) VALUES (0,'Test1');
INSERT INTO tt_m (id1,cont) VALUES (1,'Test2');

EXPLAIN SELECT * FROM tt_m WHERE id1=1;
  QUERY PLAN
---
 Result  (cost=0.00..73.50 rows=18 width=36)
   -  Append  (cost=0.00..73.50 rows=18 width=36)
 -  Seq Scan on tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 -  Seq Scan on tt_0 tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 -  Seq Scan on tt_1 tt_m  (cost=0.00..24.50 rows=6 width=36)
   Filter: (id1 = 1)
 ...

Only adding an explicit %-call to the query results in the expected plan:

EXPLAIN SELECT * FROM tt_m WHERE id1=1 AND id1 % 50 = 1;
  QUERY PLAN
---
 Result  (cost=0.00..60.60 rows=2 width=36)
   -  Append  (cost=0.00..60.60 rows=2 width=36)
 -  Seq Scan on tt_m  (cost=0.00..30.30 rows=1 width=36)
   Filter: ((id1 = 1) AND ((id1 % 50) = 1))
 -  Seq Scan on tt_1 tt_m  (cost=0.00..30.30 rows=1 width=36)
   Filter: ((id1 = 1) AND ((id1 % 50) = 1))

Did I miss something and/or how could I force the planner to use
constraint exclusion without adding the explicit second condition above?

TIA, Martin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread Rod Taylor
 capsa=# explain analyze select name from capsa.flatomfilesysentry
 where objectid in ( select dstobj from capsa.flatommemberrelation
 where srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409');
 
 
QUERY PLAN
 -
  Nested Loop IN Join  (cost=0.00..1386.45 rows=5809 width=14) (actual
 time=2.933..101467.463 rows=5841 loops=1)
Join Filter: (outer.objectid = inner.dstobj)
-  Seq Scan on flatomfilesysentry  (cost=0.00..368.09 rows=5809
 width=30) (actual time=0.007..23.451 rows=5844 loops=1)
-  Seq Scan on flatommemberrelation  (cost=0.00..439.05 rows=5842
 width=16) (actual time=0.007..11.790 rows=2922 loops=5844)

A loop for an IN indicates that you are using a very old version of
PostgreSQL (7.2 or earlier).  Please double check that the server is
8.1.3 as you indicated and not just the client.

From psql:
select version();

Hmm... Perhaps it is an 8.1.3 server with mergejoin and hashjoin
disabled?
show enable_mergejoin;
show enable_hashjoin;

You can try this query syntax:

select name from capsa.flatomfilesysentry join
capsa.flatommemberrelation on (objectid = dstobj) where srcobj =
'c1c7304a-1fe1-11db-8af7-001143214409';


  Filter: (srcobj =
 'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid)
  Total runtime: 101482.256 ms
 (6 rows)
 
 capsa=# select count(*) from capsa.flatommemberrelation ;
  count
 ---
  11932
 (1 row)
 
 capsa=# select count(*) from capsa.flatomfilesysentry ;
  count
 ---
   5977
 
 
 
 


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Partitioning / constrain exlusion not working with %-operator

2006-07-31 Thread Tom Lane
Martin Lesser [EMAIL PROTECTED] writes:
 I try to partition a large table (~ 120 mio. rows) into 50 smaller
 tables but using the IMO immutable %-function constraint exclusion
 does not work as expected:

The constraint exclusion mechanism is not as bright as you think.
There are some very limited cases where it can make a deduction that
a WHERE clause implies a CHECK constraint that's not an exact textual
equivalent ... but all those cases have to do with related b-tree
operators, and % is not one.

It's usually better to use partitioning rules that have something to
do with the WHERE-clauses you'd be using anyway.  For instance, try
to partition on ranges of id1 instead of id1 % 50.  That works because
the CHECK clauses will be like id1 = x and id1  y and those
operators are btree-related to the id1 = z clauses you'll have in
the query.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread H Hale
Look at that second seq-scan (on flatommemberrelation) - it's looping 5844 times (once for each row in flatmfilesysentry). I'd expect PG to materialise the seq-scan once and then join (unless I'm missing something, the subselect just involves the one test against a constant).I'm guessing something in your configuration is pushing your cost estimates far away from reality. Could you try issuing a "set enable_seqscan=off" and then running explain-analyse again. That will show us alternatives.Also, what performance-related configuration values have you changed? Could you post them with a brief description of your hardware?--Richard Huxton   Archonet Ltd---(end of broadcast)---TIP 2: Don't 'kill -9' the
 postmasterThe hardware is XEON 3GHZ P4 2GB Memory with 80GB SATA drive. Kernel.SHMMAX=128MB  The following config changes have been made from the defaults...  shared_buffers = 8000   # min 16 or max_connections*2, 8KB each max_fsm_pages = 5   # min max_fsm_relations*16, 6 bytes each vacuum_cost_delay = 10   # 0-1000 milliseconds stats_start_collector = on stats_row_level = on autovacuum = on# enable autovacuum subprocess? autovacuum_naptime = 20  # time between autovacuum runs, in secs autovacuum_vacuum_threshold = 500 # min # of tuple updates before# vacuum autovacuum_analyze_threshold = 250 #
 min # of tuple updates before   Here is the query plan...  capsa=# set enable_seqscan=off; SET Time: 0.478 ms capsa=# explain analyze select name from capsa.flatomfilesysentry where objectid in ( select dstobj from capsa.flatommemberrelation where srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409');  QUERY PLAN ---
 Nested Loop (cost=873.32..1017581.78 rows=6476 width=14) (actual time=80.402..241.881 rows=6473 loops=1)  - Unique (cost=871.32..903.68 rows=3229 width=16) (actual time=80.315..113.282 rows=6473 loops=1)  - Sort (cost=871.32..887.50 rows=6473 width=16) (actual time=80.310..94.279 rows=6473 loops=1)  Sort Key: flatommemberrelation.dstobj  - Bitmap Heap Scan on flatommemberrelation (cost=56.66..461.57 rows=6473 width=16) (actual time=2.613..14.229 rows=6473 loops=1)  Recheck Cond: (srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid)
  - Bitmap Index Scan on capsa_flatommemberrelation_srcobj_idx (cost=0.00..56.66 rows=6473 width=0) (actual time=2.344..2.344 rows=6473 loops=1)  Index Cond: (srcobj = 'c1c7304a-1fe1-11db-8af7-001143214409'::capsa_sys.uuid)  - Bitmap Heap Scan on flatomfilesysentry (cost=2.00..274.38 rows=3238 width=30) (actual time=0.011..0.013 rows=1 loops=6473)  Recheck Cond: (flatomfilesysentry.objectid = "outer".dstobj)  - Bitmap Index Scan on flatomfilesysentry_pkey (cost=0.00..2.00 rows=3238 width=0) (actual time=0.007..0.007 rows=1
 loops=6473)  Index Cond: (flatomfilesysentry.objectid = "outer".dstobj) Total runtime: 251.611 ms (13 rows)  Time: 252.825 ms  I went back to the stock conf settings, did a vaccuum full analyze and still get the same results.  Background...  We have spikes of activty where both tables get rows inserted  have many updates. During this time performance drops.  I have been experimenting with auto vac settings as vaccuuming was helping although query performance  did not return to normal until after the activity spike.  In this case ( and I not sure why yet) vac made no difference.  

Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread Tom Lane
H Hale [EMAIL PROTECTED] writes:
 -  Bitmap Heap Scan on flatomfilesysentry  (cost=2.00..274.38 rows=3238 
 width=30) (actual time=0.011..0.013 rows=1 loops=6473)
   Recheck Cond: (flatomfilesysentry.objectid = outer.dstobj)
   -  Bitmap Index Scan on flatomfilesysentry_pkey  (cost=0.00..2.00 
 rows=3238 width=0) (actual time=0.007..0.007 rows=1 loops=6473)
 Index Cond: (flatomfilesysentry.objectid = outer.dstobj)

Well, there's our estimation failure: 3238 rows expected, one row
actual.

What is the data distribution of flatomfilesysentry.objectid?
It looks from this example like it is unique or nearly so,
but the planner evidently does not think that.

regards, tom lane

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 15:30 schrieb Michael Stone:


If I understand the intend of this SQL,

Let me show the tables first:
   Table bacula.path(  65031 rows)
Column |  Type   |   Modifiers
+- 
+---

pathid | integer | not null default nextval('path_pathid_seq'::regclass)
path   | text| not null  ( complete pathnames of all  
directories )

Indexes:
path_pkey PRIMARY KEY, btree (pathid)
path_name_idx btree (path)

 Table bacula.file   (3021903 rows)
   Column   |  Type   |   Modifiers
+- 
+---
fileid | integer | not null default nextval 
('file_fileid_seq'::regclass)

fileindex  | integer | not null default 0
jobid  | integer | not null
pathid | integer | not null (FK)
filenameid | integer | not null (FK)
markid | integer | not null default 0
lstat  | text| not null
md5| text| not null
Indexes:
file_pkey PRIMARY KEY, btree (fileid)
file_fp_idx btree (filenameid, pathid)
file_jobid_idx btree (jobid)
file_path_idx btree (pathid)

   Table bacula.filename ( 160559 rows)
   Column   |  Type   |   Modifiers
+- 
+---
filenameid | integer | not null default nextval 
('filename_filenameid_seq'::regclass)

name   | text| not null
Indexes:
filename_pkey PRIMARY KEY, btree (filenameid)
filename_name_idx btree (name)

And now the query;

Task: Return the names of subdirectories and files immediately below  
a given path. For each none-empty subdirectory return children=true.
The 1st part of the union selects all subdirecories (per regex) and  
the flatfiles contained in them plus one entry for the subdirectory  
itself (left outer joins). More than one joined filename means: The  
subdirectory has children.
The 2nd part of the union returns all flatfiles, contained in the  
given path.
The surrounding SELECT removes the given path and the trailing /  
keeping only the subdirectory names from the pathnames, so they can  
be merged with the flatfile names.



you're pulling all the entries
in a directory in two parts. The first

(second)

part (files) is fairly straightforward. The second

(first)
part (directories) consists of pulling any file whose parent is a  
subdirectory of the directory you're looking for (this is *all*  
children of the directory, since you have to retrieve every element  
that begins with the directory, then discard those that have an  
additional / in their name), counting how many of these there are  
for each subdirectory, and discarding those results except for a  
binary (yes there are children or no there aren't). This is a lot  
of useless work to go through, and is going to be slow if you've  
got a lot of stuff in a subdirectory.

I agree, but did not yet find another way.
An alternative approach would be, for each directory, to store all  
its children (files and subdirectories) along with a flag  
indicating which it is. This would allow you to create the  
collapsed tree view without walking all the children of a  
subdirectory.

Perhaps in a temporary table?


Assuming you can't make changes to the schema, what about the query?

Can be changed.

You've got this:

Please reconsider your proposals with the above

It's hard to say without knowing what's actually *in* the tables,  
but the existing query definately doesn't scale well for what I  
think it's trying to do.


Mike Stone

Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Michael Stone

On Mon, Jul 31, 2006 at 05:06:00PM +0200, Axel Rau wrote:

Please reconsider your proposals with the above


I'm not sure what you're getting at; could you be more specific?

Mike Stone

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performances with new Intel Core* processors

2006-07-31 Thread Merlin Moncure

On 7/31/06, Jonathan Ballet [EMAIL PROTECTED] wrote:

Hello,

I've read a lot of mails here saying how good is the Opteron with PostgreSQL,
and a lot of people seems to recommend it (instead of Xeon).



I am a huge fan of the opteron but intel certainly seems to have a
winner for workstations. from my research on a per core basis the c2d
is a stronger chip with the 4mb cache version but it is unclear which
is a better choice for pg on 4 and 8 core platforms.  I have direct
personal experience with pg on dual (4 core) and quad (8 core) opteron
and the performance is fantastic, especially on 64 bit o/s with  2gb
memory (vs 32 bit xeon).

also opteron is 64 bit and mature so i think is a better choice for
server platform at the moment, especially for databases.  my mind
could be changed but it is too soon right now.  consider how long it
took for the opteron to prove itself in the server world.

merlin

---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 17:21 schrieb Michael Stone:


On Mon, Jul 31, 2006 at 05:06:00PM +0200, Axel Rau wrote:

Please reconsider your proposals with the above


I'm not sure what you're getting at; could you be more specific?

Let's see...


Am 31.07.2006 um 15:30 schrieb Michael Stone:
And then what happens if you try something like SELECT  
X.name,X.children

FROM   (SELECT [rtrim]P.path,(SELECT count(*) FROM bacula.file F
The file table is the biggest one, because it contains one row per  
backup job and file (see my column description).

You need the filename table here.

  WHERE F.pathid = P.pathid
 LIMIT 2)  1
 FROM bacula.path P
 WHERE P.path ~ '^%@/[^/]*/$'
   UNION
   SELECT FN.name,0
 FROM bacula.path P, bacula.file F, bacula.filename FN
 WHERE
   P.path = '%@/'   AND
   P.pathid = F.pathid   AND
   F.filenameid = FN.filenameid
   ) AS X
WHERE X.name  ''
GROUP BY X.name


Tweaking your query and omitting the RTRIM/REPLACE stuff, I get:
---
SELECT X.path,X.children
FROM   (SELECT P.path,(SELECT count(*) FROM bacula.file F,  
bacula.filename FN WHERE F.pathid =  
P.pathid AND F.filenameid = FN.filenameid

 LIMIT 2)  1 AS children
 FROM bacula.path P
 WHERE P.path ~ '^/Users/axel/ports/[^/]*/$'
   UNION
   SELECT FN.name,0=1
 FROM bacula.path P, bacula.file F, bacula.filename FN
 WHERE
   P.path = '/Users/axel/ports/'   AND
   P.pathid = F.pathid   AND
   F.filenameid = FN.filenameid
   ) AS X
WHERE X.path  ''
GROUP BY X.path, X.children ;
 path | children
--+--
.cvsignore   | f
/Users/axel/ports/CVS/   | t
/Users/axel/ports/archivers/ | t
INDEX| f
Makefile | f
README   | f
(6 rows)

Time: 35.221 ms
---
While my version returns:
---
name| children
+--
.cvsignore | f
archivers  | t
CVS| t
INDEX  | f
Makefile   | f
README | f
(6 rows)

Time: 30.263 ms
+--
How would you complete your version?

Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] PostgreSQL scalability on Sun UltraSparc T1

2006-07-31 Thread Merlin Moncure

On 7/29/06, Jochem van Dieten [EMAIL PROTECTED] wrote:

Tweakers.net has done a database performance test between a Sun T2000 (8
core T1) and a Sun X4200 (2 dual core Opteron 280). The database
benchmark is developed inhouse and represents the average query pattern
from their website. It is MySQL centric because Tweakers.net runs on
MySQL, but Arjen van der Meijden has ported it to PostgreSQL and has
done basic optimizations like adding indexes.


anandtech did a comparison of opteron/xeon/sun t1 not to long ago and
published some mysql/postgresql results.  however, they were careful
not to publish the quad core data for pg to compare vs. mysql, which
in my opinion would have shown a blowout victory for pg. (also was pg
8.0).

the fact is, postgresql is often faster than mysql under real
workloads, especially when utilizing features such as stored
procedures and such.

merlin

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] sub select performance due to seq scans

2006-07-31 Thread H Hale
Tom,   It is unique.  Indexes:  "flatomfilesysentry_pkey" PRIMARY KEY, btree (objectid)  "capsa_flatomfilesysentry_name_idx" btree (name) Foreign-key constraints:  "objectid" FOREIGN KEY (objectid) REFERENCES capsa_sys.master(objectid) ON DELETE CASCADE  Tom Lane [EMAIL PROTECTED] wrote: H Hale  writes: - Bitmap Heap Scan on flatomfilesysentry (cost=2.00..274.38 rows=3238 width=30) (actual time=0.011..0.013 rows=1 loops=6473)   Recheck Cond: (flatomfilesysentry.objectid = "outer".dstobj) - Bitmap Index Scan on flatomfilesysentry_pkey (cost=0.00..2.00 rows=3238 width=0) (actual time=0.007..0.007 rows=1 loops=6473) Index Cond: (flatomfilesysentry.objectid =
 "outer".dstobj)Well, there's our estimation failure: 3238 rows expected, one rowactual.What is the data distribution of flatomfilesysentry.objectid?It looks from this example like it is unique or nearly so,but the planner evidently does not think that.   regards, tom lane---(end of broadcast)---TIP 5: don't forget to increase your free space map settings

Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 17:54 schrieb Axel Rau:


Tweaking your query and omitting the RTRIM/REPLACE stuff, I get:
My example did not cover the case of empty subdirectories, in which  
case your simplified query fails:

---
 path | children
--+--
.DS_Store| f
/Users/axel/Projects/ADMIN/  | t
/Users/axel/Projects/DB/ | t
/Users/axel/Projects/HW/ | t
/Users/axel/Projects/JETSEC/ | t
/Users/axel/Projects/MISC/   | t
/Users/axel/Projects/NET/| t
/Users/axel/Projects/SW/ | t
/Users/axel/Projects/TOOLS/  | t
(9 rows)
---
Where it shoould be:
---
   name| children
---+--
.DS_Store | f
ADMIN | t
DB| t
HW| f
JETSEC| f
MISC  | f
NET   | t
SW| t
TOOLS | t
(9 rows)
---
Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 15:53 schrieb Mark Lewis:

It seems like you might be able to avoid the expensive directory  
lookups

entirely without changing the schema by defining an immutable function
dir_depth(path), which would just count the number of forward slashes.
Then create a functional index on dir_depth(path) and in the query  
do a

check for directories with a given prefix and the expected dir_depth.

Still I must check for flatfiles in those subdirectories...
See my clarification here
http://archives.postgresql.org/pgsql-performance/2006-07/msg00311.php

Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Performances with new Intel Core* processors

2006-07-31 Thread Arjen van der Meijden

On 31-7-2006 17:52, Merlin Moncure wrote:

On 7/31/06, Jonathan Ballet [EMAIL PROTECTED] wrote:

Hello,

I've read a lot of mails here saying how good is the Opteron with 
PostgreSQL,

and a lot of people seems to recommend it (instead of Xeon).


I am a huge fan of the opteron but intel certainly seems to have a
winner for workstations. from my research on a per core basis the c2d
is a stronger chip with the 4mb cache version but it is unclear which
is a better choice for pg on 4 and 8 core platforms.  I have direct
personal experience with pg on dual (4 core) and quad (8 core) opteron
and the performance is fantastic, especially on 64 bit o/s with  2gb
memory (vs 32 bit xeon).


As far as I know there is no support for more than two Woodcrest 
processors (Core 2 version of the Xeon) in a system. So when using a 
scalable application (like postgresql) and you need more than four 
cores, Opteron is still the only option in the x86 world.


The Woodcrest however is faster than a comparably priced Opteron using 
Postgresql. In a benchmark we did (and have yet to publish) a Woodcrest 
system outperforms a comparable Sun Fire x4200. And even if you'd adjust 
it to a clock-by-clock comparison, Woodcrest would still beat the 
Opteron. If you'd adjust it to a price/performance comparison (I 
configured a HP DL 380G5-system which is similar to what we tested on 
their website), the x4200 would loose as well. Mind you a Opteron 280 
2.4Ghz or 285 2.6Ghz costs more than a Woodcrest 5150 2.66Ghz or 5160 
3Ghz (resp.), but the FB-Dimm memory for the Xeons is more expensive 
than the DDR or DDR2 ECC REG memory you need in a Opteron.



also opteron is 64 bit and mature so i think is a better choice for
server platform at the moment, especially for databases.  my mind
could be changed but it is too soon right now.  consider how long it
took for the opteron to prove itself in the server world.


Intel Woodcrest can do 64-bit as well. As can all recent Xeons. Whether 
Opteron does a better job at 64-bit than a Xeon, I don't know (our test 
was in 64-bit though). I have not seen our Xeon 64-bits production 
servers be any less stable than our Opteron 64-bit servers.
For a database system, however, processors hardly ever are the main 
bottleneck, are they? So you should probably go for a set of fast 
processors from your favorite supplier and focus mainly on lots of 
memory and fast disks. Whether that employs Opterons or Xeon Woodcrest 
(no other Xeons are up to that competition, imho) doesn't really matter.


We'll be publishing the article in the near future, and I'll give a 
pointer to it (even though it will be in Dutch, you can still read the 
graphs).


Best regards,

Arjen van der Meijden
Tweakers.net

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Michael Stone

On Mon, Jul 31, 2006 at 05:54:41PM +0200, Axel Rau wrote:
The file table is the biggest one, because it contains one row per  
backup job and file (see my column description).


I never saw a column description--that would certainly help. :) I saw a 
schema, but not an explanation of what the elements do. From what I can 
understand of what you're saying, it is sounding as though the 
bacula.file table contains an entry for the subdirectory itself as well 
as entries for each file in the subdirectory? And the reason you need to 
join back to the filename table is that there may be multiple copies of 
the filename from multiple backups? Does the subdirectory itself have an 
entry in the filename table? What is the content of the lstat column; can 
it be used to distinguish a file from a directory? Similarly for the md5 
column--what would it contain for a directory?


Mike Stone

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] directory tree query with big planner variation

2006-07-31 Thread Axel Rau


Am 31.07.2006 um 19:08 schrieb Michael Stone:

I never saw a column description--that would certainly help. :) I  
saw a schema, but not an explanation of what the elements do. From  
what I can understand of what you're saying, it is sounding as  
though the bacula.file table contains an entry for the subdirectory  
itself as well as entries for each file in the subdirectory?
It is the junction relation between path and filename and job and  
describes
1. which files (identified by bacula.filename) are in a directory  
(identified by bacula.path)
2. For each of those files they record a snapshot with  
characteristics (lstat [base64 encoded], md5-checksum and a backup- 
job [via jobid], which itself has backup-time etc.)
And the reason you need to join back to the filename table is that  
there may be multiple copies of the filename from multiple backups?
One entry per backup(job) for each bacula.path/bacula.filename pair  
in bacula.file.

Does the subdirectory itself have an entry in the filename table?
Yes. Directories reference an entry containing '' in  
bacula.filename.name.

What is the content of the lstat column

File status info -- see stat(2).

; can it be used to distinguish a file from a directory?
Yes, the S_IFDIR bit identifies directories, but the whole lstat  
column is base64 encoded

Similarly for the md5 column--what would it contain for a directory?

It seems to contain 0.

Axel
Axel Rau, ☀Frankfurt , Germany   +49-69-951418-0



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] Performances with new Intel Core* processors

2006-07-31 Thread Bruce Momjian

Good to know.  We have been waiting for performance comparisons on
the new Intel CPUs.

---

Arjen van der Meijden wrote:
 On 31-7-2006 17:52, Merlin Moncure wrote:
  On 7/31/06, Jonathan Ballet [EMAIL PROTECTED] wrote:
  Hello,
 
  I've read a lot of mails here saying how good is the Opteron with 
  PostgreSQL,
  and a lot of people seems to recommend it (instead of Xeon).
  
  I am a huge fan of the opteron but intel certainly seems to have a
  winner for workstations. from my research on a per core basis the c2d
  is a stronger chip with the 4mb cache version but it is unclear which
  is a better choice for pg on 4 and 8 core platforms.  I have direct
  personal experience with pg on dual (4 core) and quad (8 core) opteron
  and the performance is fantastic, especially on 64 bit o/s with  2gb
  memory (vs 32 bit xeon).
 
 As far as I know there is no support for more than two Woodcrest 
 processors (Core 2 version of the Xeon) in a system. So when using a 
 scalable application (like postgresql) and you need more than four 
 cores, Opteron is still the only option in the x86 world.
 
 The Woodcrest however is faster than a comparably priced Opteron using 
 Postgresql. In a benchmark we did (and have yet to publish) a Woodcrest 
 system outperforms a comparable Sun Fire x4200. And even if you'd adjust 
 it to a clock-by-clock comparison, Woodcrest would still beat the 
 Opteron. If you'd adjust it to a price/performance comparison (I 
 configured a HP DL 380G5-system which is similar to what we tested on 
 their website), the x4200 would loose as well. Mind you a Opteron 280 
 2.4Ghz or 285 2.6Ghz costs more than a Woodcrest 5150 2.66Ghz or 5160 
 3Ghz (resp.), but the FB-Dimm memory for the Xeons is more expensive 
 than the DDR or DDR2 ECC REG memory you need in a Opteron.
 
  also opteron is 64 bit and mature so i think is a better choice for
  server platform at the moment, especially for databases.  my mind
  could be changed but it is too soon right now.  consider how long it
  took for the opteron to prove itself in the server world.
 
 Intel Woodcrest can do 64-bit as well. As can all recent Xeons. Whether 
 Opteron does a better job at 64-bit than a Xeon, I don't know (our test 
 was in 64-bit though). I have not seen our Xeon 64-bits production 
 servers be any less stable than our Opteron 64-bit servers.
 For a database system, however, processors hardly ever are the main 
 bottleneck, are they? So you should probably go for a set of fast 
 processors from your favorite supplier and focus mainly on lots of 
 memory and fast disks. Whether that employs Opterons or Xeon Woodcrest 
 (no other Xeons are up to that competition, imho) doesn't really matter.
 
 We'll be publishing the article in the near future, and I'll give a 
 pointer to it (even though it will be in Dutch, you can still read the 
 graphs).
 
 Best regards,
 
 Arjen van der Meijden
 Tweakers.net
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend

-- 
  Bruce Momjian   [EMAIL PROTECTED]
  EnterpriseDBhttp://www.enterprisedb.com

  + If your life is a hard drive, Christ can be your backup. +

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Performances with new Intel Core* processors

2006-07-31 Thread Scott Marlowe
On Mon, 2006-07-31 at 11:30, Arjen van der Meijden wrote:
 On 31-7-2006 17:52, Merlin Moncure wrote:

 For a database system, however, processors hardly ever are the main 
 bottleneck, are they? So you should probably go for a set of fast 
 processors from your favorite supplier and focus mainly on lots of 
 memory and fast disks. Whether that employs Opterons or Xeon Woodcrest 
 (no other Xeons are up to that competition, imho) doesn't really matter.

Just making a quick comment here.  While the CPU core itself nowadays
certainly is not the most common bottleneck for a fast db server, the
ability of the CPU/Memory combo to act as a datapump IS often a limit.

In that case, you want to go with whichever setup gives you the fastest
access to memory.

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Performances with new Intel Core* processors

2006-07-31 Thread Vivek Khera


On Jul 31, 2006, at 12:30 PM, Arjen van der Meijden wrote:

For a database system, however, processors hardly ever are the main  
bottleneck, are they? So you should probably go for a set of fast  
processors from your favorite supplier and focus mainly on lots of  
memory and fast disks. Whether that employs Opterons or Xeon  
Woodcrest (no other Xeons are up to that


No, but it *does* matter how fast said processor can sling the memory  
around, and in my experience, the opterons have been much better at  
that due to the efficiency of the memory transport layer.

smime.p7s
Description: S/MIME cryptographic signature


[PERFORM] Are there any performance penalty for opposite edian platform combinations....

2006-07-31 Thread Guoping Zhang
Hello,

I apologize that if the similar questions were already asked and answered
before.
Here is a go:

a) If we have application clients running on a Solaris 10/SPARC box and
database server running on a Solaris10 X_86 box; further, we have a few
tables, in which we split an integer type of field into several our own
defined bit map segement, upon them, we have a set of logic operation
implemented in our applications, MY question is, could the different edian
scheme (SPARC is a big edian and X_86 is the opposite) possibly slow down
the applcation?

In fact, it is a general question that Is it a good practice we shall avoid
to run application server and database server on the platform with opposite
edian? or it simply doesn't matter?

b) Same direction for the question, if using slony-1, if master server is
running on a big edian host but slave is running on a small edian host, are
there any performance loss due to the edian difference?

Thanks in advance for your opinions.

Regards,
Guoping Zhang


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster