Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Luke Lonergan
Mark,

See the results below and analysis - the pure HeapScan gets 94.1% of the max
available read bandwidth (cool!).  Nothing wrong with heapscan in the
presence of large readahead, which is good news.

That says it's something else in the path.  As you probably know there is a
page lock taken, a copy of the tuple from the page, lock removed, count
incremented for every iteration of the agg node on a count(*).  Is the same
true of a count(1)?

I recall that the profile is full of memcpy and memory context calls.

It would be nice to put some tracers into the executor and see where the
time is going.  I'm also curious about the impact of the new 8.1 virtual
tuples in reducing the executor overhead.  In this case my bet's on the agg
node itself, what do you think?

- Luke

On 11/21/05 9:10 PM, Mark Kirkwood [EMAIL PROTECTED] wrote:

 Luke Lonergan wrote:
 
 So that leaves the question - why not more than 64% of the I/O scan rate?
 And why is it a flat 64% as the I/O subsystem increases in speed from
 333-400MB/s?
 
 
 It might be interesting to see what effect reducing the cpu consumption
   entailed by the count aggregation has - by (say) writing a little bit
 of code to heap scan the desired relation (sample attached).

OK - here are results for a slightly smaller (still bigger than RAM)
lineitem on the same machine, using the same xfs filesystem that achieved
407MB/s:


12.9GB of DBT-3 data from the lineitem table

llonergan=# select relpages from pg_class where relname='lineitem';
 relpages 
--
  1579270
(1 row)

1579270*8192/100
12937 Million Bytes or 12.9GB

llonergan=# \timing
Timing is on.
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 197870.105 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 49912.164 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 49218.739 ms

llonergan=# select fastcount('lineitem');
 fastcount 
---
  59986052
(1 row)

Time: 33752.778 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---
  59986052
(1 row)

Time: 34543.646 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---
  59986052
(1 row)

Time: 34528.053 ms


Analysis:

Bandwidth   Percent of max
dd Read 407MB/s 100%
Count(1)263MB/s 64.6%
HeapScan383MB/s 94.1%

Wow - looks like the HeapScan gets almost all of the available bandwidth!

- Luke



---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:



12.9GB of DBT-3 data from the lineitem table

llonergan=# select relpages from pg_class where relname='lineitem';
 relpages 
--

  1579270
(1 row)

1579270*8192/100
12937 Million Bytes or 12.9GB

llonergan=# \timing
Timing is on.
llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 197870.105 ms


So 198 seconds is the uncached read time with count (Just for clarity, 
did you clear the Pg and filesystem caches or unmount / remount the 
filesystem?)



llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 49912.164 ms
llonergan=# select count(1) from lineitem;
  count   
--

 59986052
(1 row)

Time: 49218.739 ms



and ~50 seconds is the (partially) cached read time with count


llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 33752.778 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 34543.646 ms
llonergan=# select fastcount('lineitem');
 fastcount 
---

  59986052
(1 row)

Time: 34528.053 ms



so ~34 seconds is the (partially) cached read time for fastcount -
I calculate this to give ~362Mb/s effective IO rate (I'm doing / by 
1024*1024 not 1000*1000) FWIW.


While this is interesting, you probably want to stop Pg, unmount the 
filesystem, and restart Pg to get the uncached time for fastcount too 
(and how does this compare to uncached read with dd using the same block 
size?).


But at this stage it certainly looks the the heapscan code is pretty 
efficient - great!


Oh - and do you want to try out 32K block size, I'm interested to see 
what level of improvement you get (as my system is hopelessly cpu bound...)!




Analysis:

Bandwidth   Percent of max
dd Read 407MB/s 100%
Count(1)263MB/s 64.6%
HeapScan383MB/s 94.1%



Cheers

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,


It would be nice to put some tracers into the executor and see where the
time is going.  I'm also curious about the impact of the new 8.1 virtual
tuples in reducing the executor overhead.  In this case my bet's on the agg
node itself, what do you think?



Yeah - it's pretty clear that the count aggregate is fairly expensive 
wrt cpu - However, I am not sure if all agg nodes suffer this way (guess 
we could try a trivial aggregate that does nothing for all tuples bar 
the last and just reports the final value it sees).


Cheers

Mark


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Luke Lonergan
Mark,

 Time: 197870.105 ms
 
 So 198 seconds is the uncached read time with count (Just for clarity,
 did you clear the Pg and filesystem caches or unmount / remount the
 filesystem?)

Nope - the longer time is due to the second write known issue with
Postgres - it writes the data to the table, but all of the pages are marked
dirty?  So, always on the first scan after loading they are written again.
This is clear as you watch vmstat - the pattern on the first seq scan is
half read / half write.

 Time: 49218.739 ms
 
 
 and ~50 seconds is the (partially) cached read time with count

Again - the pattern here is pure read and completely non-cached.  You see a
very nearly constant I/O rate when watching vmstat for the entire scan.

 Time: 34528.053 ms

 so ~34 seconds is the (partially) cached read time for fastcount -
 I calculate this to give ~362Mb/s effective IO rate (I'm doing / by
 1024*1024 not 1000*1000) FWIW.

The dd number uses 1000*1000, so I maintained it for the percentage of max.
 
 While this is interesting, you probably want to stop Pg, unmount the
 filesystem, and restart Pg to get the uncached time for fastcount too
 (and how does this compare to uncached read with dd using the same block
 size?).

I'll do it again sometime, but I've already deleted the file.  I've done the
following in the past to validate this though:

- Reboot machine
- Rerun scan

And we get identical results.
  
 But at this stage it certainly looks the the heapscan code is pretty
 efficient - great!

Yep.
 
 Oh - and do you want to try out 32K block size, I'm interested to see
 what level of improvement you get (as my system is hopelessly cpu bound...)!

Yah - done so in the past and not seen any - was waiting for Alan to post
his results.
 
 
 Analysis:
 
 Bandwidth   Percent of max
 dd Read 407MB/s 100%
 Count(1)263MB/s 64.6%
 HeapScan383MB/s 94.1%

Note these are all in consistent 1000x1000 units.

Thanks for the test - neat trick!  We'll use it to do some more profiling
some time soon...

- Luke



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:


That says it's something else in the path.  As you probably know there is a
page lock taken, a copy of the tuple from the page, lock removed, count
incremented for every iteration of the agg node on a count(*).  Is the same
true of a count(1)?



Sorry Luke - message 3 - I seem to be suffering from a very small 
working memory buffer myself right now, I think it's after a day of 
working with DB2 ... :-)


Anyway, as I read src/backend/parser/gram.y:6542 - count(*) is 
transformed into count(1), so these two are identical.


Cheers (last time tonight, promise!)

Mark

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Luke Lonergan wrote:

Mark,



Time: 197870.105 ms


So 198 seconds is the uncached read time with count (Just for clarity,
did you clear the Pg and filesystem caches or unmount / remount the
filesystem?)



Nope - the longer time is due to the second write known issue with
Postgres - it writes the data to the table, but all of the pages are marked
dirty?  So, always on the first scan after loading they are written again.
This is clear as you watch vmstat - the pattern on the first seq scan is
half read / half write.



Ah - indeed - first access after a COPY no? I should have thought of 
that, sorry!



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard
THIS MAY SEEM SILLY but vacuum is mispelled below and presumably  
there was never any ANALYZE done.




postgres=# vaccum full verbose analyze;

I do have done the vacUUm full verbose analyze;.
But I copy/paste the wrong line.

Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

  http://www.postgresql.org/docs/faq


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard

Pailloncy Jean-Gerard [EMAIL PROTECTED] writes:

Why the stupid indexscan plan on the whole table ?


Pray tell, what are you using for the planner cost parameters?
The only way I can come close to duplicating your numbers is
by setting random_page_cost to somewhere around 0.01 ...



I did not change the costs.

 grep cost postgresql.conf
# note: increasing max_connections costs ~400 bytes of shared memory per
# note: increasing max_prepared_transactions costs ~600 bytes of  
shared memory

#vacuum_cost_delay = 0  # 0-1000 milliseconds
#vacuum_cost_page_hit = 1   # 0-1 credits
#vacuum_cost_page_miss = 10 # 0-1 credits
#vacuum_cost_page_dirty = 20# 0-1 credits
#vacuum_cost_limit = 200# 0-1 credits
#random_page_cost = 4   # units are one sequential  
page fetch

# cost
#cpu_tuple_cost = 0.01  # (same)
#cpu_index_tuple_cost = 0.001   # (same)
#cpu_operator_cost = 0.0025 # (same)
#autovacuum_vacuum_cost_delay = -1  # default vacuum cost delay for
# vacuum_cost_delay
#autovacuum_vacuum_cost_limit = -1  # default vacuum cost limit for
# vacuum_cost_limit


Cordialement,
Jean-Gérard Pailloncy


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


[PERFORM] xlog flush request error

2005-11-24 Thread Vipul . Gupta

Hi ,

i get the following error on doing anything with the database after starting it.
Can anyone suggest how do i fix this

xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324

Vipul Gupta


[PERFORM] Very slow queries - please help.

2005-11-24 Thread Bealach-na Bo

Hi Folks,

I'm new to Postgresql.

I'm having great difficulties getting the performance I had hoped for
from Postgresql 8.0. The typical query below takes ~20 minutes !!

I hope an expert out there will tell me what I'm doing wrong - I hope
*I* am doing something wrong.

Hardware

Single processor, Intel Xeon 3.06 GHz machine running Red Hat
Ent. 4. with 1.5 GB of RAM.

The machine is dedicated to running Postgresql 8.0 and Apache/mod_perl
etc. The database is being accessed for report generation via a web
form. The web server talks to Pg over TCP/IP (I know, that I don't
need to do this if they are all on the same machine, but I have good
reasons for this and don't suspect that this is where my problems are
- I have the same poor performance when running from psql on the
server.)

Database

Very simple, not fully normalized set of two tables. The first table,
very small (2000 lines of 4 cols with very few chars and integers in
in col). The other quite a bit larger (50 lines with 15
cols. with the largest fields ~ 256 chars)

Typical query


SELECT n.name
FROM node n
WHERE n.name
LIKE '56x%'
AND n.type='H'
AND n.usage='TEST'
AND n.node_id
NOT IN
(select n.node_id
FROM job_log j
INNER JOIN node n
ON j.node_id = n.node_id
WHERE n.name
LIKE '56x%'
AND n.type='H'
AND n.usage='TEST'
AND j.job_name = 'COPY FILES'
AND j.job_start = '2005-11-14 00:00:00'
AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL))
ORDER BY n.name


The node table is the small table and the job_log table is the large
table.


I've tried all the basic things that I found in the documentation like
VACUUM ANALYZE, EXPLAIN etc., but I suspect there is something
terribly wrong with what I'm doing and these measures will not shave
off 19 min and 50 seconds off the query time.

Any help and comments would be very much appreciated.


Bealach



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Very slow queries - please help.

2005-11-24 Thread Claus Guttesen
 Typical query
 

 SELECT n.name
 FROM node n
 WHERE n.name
 LIKE '56x%'
 AND n.type='H'
 AND n.usage='TEST'
 AND n.node_id
 NOT IN
 (select n.node_id
 FROM job_log j
 INNER JOIN node n
 ON j.node_id = n.node_id
 WHERE n.name
 LIKE '56x%'
 AND n.type='H'
 AND n.usage='TEST'
 AND j.job_name = 'COPY FILES'
 AND j.job_start = '2005-11-14 00:00:00'
 AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL))
 ORDER BY n.name

Do you have any indexes?

regards
Claus

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] High context switches occurring

2005-11-24 Thread Sven Geisler

Hi Anjan,

I can support Scott. You should turn on HT if you see high values for CS.

I do have a few customers running a web-based 3-tier application with 
PostgreSQL. We had to turn off HT to have better overall performance.
The issue is the behavior under high load. I notice that HT on does 
collapse faster.


Just a question. Which version of XEON do you have? What is does the 
server have as memory architecture.


I think, Dual-Core XEON's are no issue. One of our customers does use a 
4-way Dual-Core Opteron 875 since a few months. We have Pg 8.0.3 and it 
runs perfect. I have to say that we use a special patch from Tom which 
fix an issue with the looking of shared buffers and the Opteron.

I notice that this patch is also useful for XEON's with EMT64.

Best regards
Sven.

Anjan Dave schrieb:

Yes, it's turned on, unfortunately it got overlooked during the setup,
and until now...!

It's mostly a 'read' application, I increased the vm.max-readahead to
2048 from the default 256, after which I've not seen the CS storm,
though it could be incidental.

Thanks,
Anjan

-Original Message-
From: Scott Marlowe [mailto:[EMAIL PROTECTED] 
Sent: Tuesday, November 22, 2005 3:38 PM

To: Anjan Dave
Cc: Tom Lane; Vivek Khera; Postgresql Performance
Subject: Re: [PERFORM] High context switches occurring

On Tue, 2005-11-22 at 14:33, Anjan Dave wrote:


Is there any way to get a temporary relief from this Context Switching
storm? Does restarting postmaster help?

It seems that I can recreate the heavy CS with just one SELECT
statement...and then when multiple such SELECT queries are coming in,
things just get hosed up until we cancel a bunch of queries...



Is your machine a hyperthreaded one?  Some folks have found that turning
off hyper threading helps.  I knew it made my servers better behaved in
the past.


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


--
/This email and any files transmitted with it are confidential and
intended solely for the use of the individual or entity to whom they
are addressed. If you are not the intended recipient, you should not
copy it, re-transmit it, use it or disclose its contents, but should
return it to the sender immediately and delete your copy from your
system. Thank you for your cooperation./

Sven Geisler [EMAIL PROTECTED] Tel +49.30.5362.1627 Fax .1638
Senior Developer,AEC/communications GmbHBerlin,   Germany

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Very slow queries - please help.

2005-11-24 Thread Bealach-na Bo


Hi,

Thanks for your comments. I've explicitly made any indexes, but the
default ones are:



[EMAIL PROTECTED] \di
  List of relations
Schema  |  Name   | Type  |  Owner  |  Table
-+-+---+-+-
user | job_log_id_pkey | index | user | job_log
user | node_id_pkey| index | user | node
user | node_name_key   | index | user | node
(3 rows)



I'm also sending the EXPLAIN outputs.





 explain SELECT n.name,n.type,
n.usage, j.status,
j.job_start,j.job_stop,
j.nfiles_in_job,j.job_name
 FROM job_log j
 INNER JOIN node n
 ON j.node_id = n.node_id
 WHERE n.name
 LIKE '56x%'
 AND n.type = 'K'
 AND n.usage = 'LIVE'
 AND j.job_name = 'COPY FILES'
 AND j.job_start = '2005-11-14 00:00:00'
 AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL)
 ORDER BY n.name;



 QUERY PLAN

--
Nested Loop  (cost=0.00..75753.31 rows=1 width=461)
  Join Filter: (inner.node_id = outer.node_id)
  -  Index Scan using node_name_key on node n  (cost=0.00..307.75 rows=1 
width=181)
Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND 
(usage = 'LIVE'::bpchar))

  -  Seq Scan on job_log j  (cost=0.00..75445.54 rows=1 width=288)
Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = 
'2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = 
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))

(6 rows)


 explain SELECT n.name, n.type, n.usage
 FROM node n
 WHERE n.name
 LIKE '56x%'
 AND n.type  = 'K'
 AND n.usage = 'LIVE'
 AND n.node_id
 NOT IN
 (SELECT n.node_id
 FROM job_log j
 INNER JOIN node n
 ON j.node_id = n.node_id
 WHERE n.name
 LIKE '56x%'
 AND n.type  = 'K'
 AND n.usage = 'LIVE'
 AND j.job_name = 'COPY FILES'
 AND j.job_start = '2005-11-14 00:00:00'
 AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL))
 ORDER BY n.name;







 QUERY PLAN

--
Index Scan using node_name_key on node n  (cost=75451.55..75764.94 rows=1 
width=177)
  Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) AND (usage = 
'LIVE'::bpchar) AND (NOT (hashed subplan)))

  SubPlan
-  Nested Loop  (cost=0.00..75451.54 rows=1 width=4)
  -  Seq Scan on job_log j  (cost=0.00..75445.54 rows=1 width=4)
Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start 
= '2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = 
'2005-11-14 05:00:00'::timestamp without time zone) OR (job_stop IS NULL)))
  -  Index Scan using node_id_pkey on node n  (cost=0.00..5.99 
rows=1 width=4)

Index Cond: (outer.node_id = n.node_id)
Filter: ((name ~~ '56x%'::text) AND (type = 'K'::bpchar) 
AND (usage = 'LIVE'::bpchar))



Yours,

Bealach



From: Claus Guttesen [EMAIL PROTECTED]
To: Bealach-na Bo [EMAIL PROTECTED]
CC: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Very slow queries - please help.
Date: Thu, 24 Nov 2005 14:23:38 +0100

 Typical query
 

 SELECT n.name
 FROM node n
 WHERE n.name
 LIKE '56x%'
 AND n.type='H'
 AND n.usage='TEST'
 AND n.node_id
 NOT IN
 (select n.node_id
 FROM job_log j
 INNER JOIN node n
 ON j.node_id = n.node_id
 WHERE n.name
 LIKE '56x%'
 AND n.type='H'
 AND n.usage='TEST'
 AND j.job_name = 'COPY FILES'
 AND j.job_start = '2005-11-14 00:00:00'
 AND (j.job_stop = '2005-11-22 09:31:10' OR j.job_stop IS NULL))
 ORDER BY n.name

Do you have any indexes?

regards
Claus




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Very slow queries - please help.

2005-11-24 Thread Guillaume Smet

Hi,


I'm also sending the EXPLAIN outputs.


Please provide EXPLAIN ANALYZE outputs instead of EXPLAIN. You will have 
more information.


Indexes on your tables are obviously missing. You should try to add:

CREATE INDEX idx_node_filter ON node(name, type, usage);
CREATE INDEX idx_job_log_filter ON job_log(job_name, job_start, job_stop);

I'm not so sure it's a good idea to add job_stop in this index as you 
have an IS NULL in your query so I'm not sure it can be used. You should 
try it anyway and remove it if not needed.


I added all your search fields in the indexes but it depends a lot on 
the selectivity of your conditions. I don't know your data but I think 
you understand the idea.


HTH

--
Guillaume

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Very slow queries - please help.

2005-11-24 Thread Tom Lane
Bealach-na Bo [EMAIL PROTECTED] writes:
 I'm having great difficulties getting the performance I had hoped for
 from Postgresql 8.0. The typical query below takes ~20 minutes !!

You need to show us the table definition (including indexes) and the
EXPLAIN ANALYZE results for the query.

It seems likely that the NOT IN is the source of your problems,
but it's hard to be sure without EXPLAIN results.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] xlog flush request error

2005-11-24 Thread Tom Lane
[EMAIL PROTECTED] writes:
 Can anyone suggest how do i fix this

  xlog flush request 7/7D02338C is not satisfied --- flushed only to 
 3/2471E324

This looks like corrupt data to me --- specifically, garbage in the LSN
field of a page header.  Is that all you get?  PG 7.4 and up should tell
you the problem page number in a CONTEXT: line.

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Greg Stark
Mark Kirkwood [EMAIL PROTECTED] writes:

 Yeah - it's pretty clear that the count aggregate is fairly expensive wrt cpu 
 -
 However, I am not sure if all agg nodes suffer this way (guess we could try a
 trivial aggregate that does nothing for all tuples bar the last and just
 reports the final value it sees).

As you mention count(*) and count(1) are the same thing.

Last I heard the reason count(*) was so expensive was because its state
variable was a bigint. That means it doesn't fit in a Datum and has to be
alloced and stored as a pointer. And because of the Aggregate API that means
it has to be allocated and freed for every tuple processed.

There was some talk of having a special case API for count(*) and maybe
sum(...) to avoid having to do this.

There was also some talk of making Datum 8 bytes wide on platforms where that
was natural (I guess AMD64, Sparc64, Alpha, Itanic).

Afaik none of these items have happened but I don't know for sure.

-- 
greg


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Tom Lane
Greg Stark [EMAIL PROTECTED] writes:
 Last I heard the reason count(*) was so expensive was because its state
 variable was a bigint. That means it doesn't fit in a Datum and has to be
 alloced and stored as a pointer. And because of the Aggregate API that means
 it has to be allocated and freed for every tuple processed.

There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil
Conway IIRC).

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Greg Stark
Tom Lane [EMAIL PROTECTED] writes:

 Greg Stark [EMAIL PROTECTED] writes:
  Last I heard the reason count(*) was so expensive was because its state
  variable was a bigint. That means it doesn't fit in a Datum and has to be
  alloced and stored as a pointer. And because of the Aggregate API that means
  it has to be allocated and freed for every tuple processed.
 
 There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil
 Conway IIRC).

ah, cool, missed that.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Luke Lonergan
The same 12.9GB distributed across 4 machines using Bizgres MPP fits into
I/O cache.  The interesting result is that the query select count(1) is
limited in speed to 280 MB/s per CPU when run on the lineitem table.  So
when I run it spread over 4 machines, one CPU per machine I get this:

==
Bizgres MPP, 4 data segments, 1 per 2 CPUs
==
llonergan=# explain select count(1) from lineitem;
  QUERY PLAN

--
 Aggregate  (cost=582452.00..582452.00 rows=1 width=0)
   -  Gather Motion  (cost=582452.00..582452.00 rows=1 width=0)
 -  Aggregate  (cost=582452.00..582452.00 rows=1 width=0)
   -  Seq Scan on lineitem  (cost=0.00..544945.00 rows=15002800
width=0)
(4 rows)

llonergan=# \timing
Timing is on.
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 12191.435 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 11986.109 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 11448.941 ms
==

That's 12,937 MB in 11.45 seconds, or 1,130 MB/s.  When you divide out the
number of Postgres instances (4), that's 283MB/s per Postgres instance.

To verify that this has nothing to do with MPP, I ran it in a special
internal mode on one instance and got the same result.

So - we should be able to double this rate by running one segment per CPU,
or two per host:

==
Bizgres MPP, 8 data segments, 1 per CPU
==
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 6484.594 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 6156.729 ms
llonergan=# select count(1) from lineitem;
  count   
--
 59986052
(1 row)

Time: 6063.416 ms
==
That's 12,937 MB in 11.45 seconds, or 2,134 MB/s.  When you divide out the
number of Postgres instances (8), that's 267MB/s per Postgres instance.

So, if you want to select count(1), using more CPUs is a good idea!  For
most complex queries, having lots of CPUs + MPP is a good combo.

Here is an example of a sorting plan - this should probably be done with a
hash aggregation, but using 8 CPUs makes it go 8x faster:


- Luke



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Very slow queries - please help

2005-11-24 Thread Bealach-na Bo

OK.

The consensus seems to be that I need more indexes and I also need to
look into the NOT IN statement as a possible bottleneck. I've
introduced the indexes which has led to a DRAMATIC change in response
time. Now I have to experiment with INNER JOIN - OUTER JOIN
variations, SET ENABLE_SEQSCAN=OFF.

Forgive me for not mentioning each person individually and by name.
You have all contributed to confirming what I had suspected (and
hoped): that *I* have a lot to learn!

I'm attaching table descriptions, the first few lines of top output
while the queries were running, index lists, sample queries and
EXPLAIN ANALYSE output BEFORE and AFTER the introduction of the
indexes. As I said, DRAMATIC :) I notice that the CPU usage does not
vary very much, it's nearly 100% anyway, but the memory usage drops
markedly, which is another very nice result of the index introduction.

Any more comments and tips would be very welcome.

Thank you all for your input.

Bealach.




[EMAIL PROTECTED] \d job_log
Table blouser.job_log
Column |Type |Modifiers
+-+--
job_log_id | integer | not null default 
nextval('job_log_id_seq'::text)

first_registry | timestamp without time zone |
blogger_name   | character(50)   |
node_id| integer |
job_type   | character(50)   |
job_name   | character(256)  |
job_start  | timestamp without time zone |
job_timeout| interval|
job_stop   | timestamp without time zone |
nfiles_in_job  | integer |
status | integer |
error_code | smallint|
Indexes:
   job_log_id_pkey PRIMARY KEY, btree (job_log_id)
Check constraints:
   job_log_status_check CHECK (status = 0 OR status = 1 OR status = 8 OR 
status = 9)

Foreign-key constraints:
   legal_node FOREIGN KEY (node_id) REFERENCES node(node_id)





[EMAIL PROTECTED] \d node
 Table blouser.node
Column  | Type  |   Modifiers
-+---+---
node_id | integer   | not null default nextval('node_id_seq'::text)
name| character(50) |
type| character(1)  |
usage   | character(4)  |
Indexes:
   node_id_pkey PRIMARY KEY, btree (node_id)
   node_name_key UNIQUE, btree (name)
Check constraints:
   node_type_check CHECK (type = 'B'::bpchar OR type = 'K'::bpchar OR 
type = 'C'::bpchar OR type = 'T'::bpchar OR type = 'R'::bpchar)
   node_usage_check CHECK (usage = 'TEST'::bpchar OR usage = 
'LIVE'::bpchar)



#before new indexes were created


Tasks: 114 total,   2 running, 112 sleeping,   0 stopped,   0 zombie
Cpu(s): 25.7% us, 24.5% sy,  0.0% ni, 49.4% id,  0.3% wa,  0.0% hi,  0.0% si
Mem:   1554788k total,  1513576k used,41212k free,31968k buffers
Swap:  1020024k total,27916k used,   992108k free,   708728k cached

 PID USER  PR  NI  VIRT  RES  SHR S %CPU %MEMTIME+  COMMAND
25883 postgres  25   0 20528  12m  11m R 99.7  0.8   4:54.91 postmaster





[EMAIL PROTECTED] \di
  List of relations
Schema  |  Name   | Type  |  Owner  |  Table
-+-+---+-+-
blouser | job_log_id_pkey | index | blouser | job_log
blouser | node_id_pkey| index | blouser | node
blouser | node_name_key   | index | blouser | node
(3 rows)


 EXPLAIN ANALYSE SELECT n.name,n.type,
n.usage, j.status,
j.job_start,j.job_stop,
j.nfiles_in_job,j.job_name
 FROM job_log j
 INNER JOIN node n
 ON j.node_id = n.node_id
 WHERE n.name
 LIKE '711%'
 AND n.type = 'K'
 AND n.usage = 'LIVE'
 AND j.job_name = 'COPY FILES'
 AND j.job_start = '2005-11-14 00:00:00'
 AND (j.job_stop = '2005-11-14 05:00:00' OR j.job_stop IS NULL)
 ORDER BY n.name;


 QUERY PLAN

---
Nested Loop  (cost=0.00..75753.31 rows=1 width=461) (actual 
time=270486.692..291662.350 rows=3 loops=1)

  Join Filter: (inner.node_id = outer.node_id)
  -  Index Scan using node_name_key on node n  (cost=0.00..307.75 rows=1 
width=181) (actual time=0.135..11.034 rows=208 loops=1)
Filter: ((name ~~ '711%'::text) AND (type = 'K'::bpchar) AND 
(usage = 'LIVE'::bpchar))
  -  Seq Scan on job_log j  (cost=0.00..75445.54 rows=1 width=288) (actual 
time=273.374..1402.089 rows=22 loops=208)
Filter: ((job_name = 'COPY FILES'::bpchar) AND (job_start = 
'2005-11-14 00:00:00'::timestamp without time zone) AND ((job_stop = 
'2005-11-14 

[PERFORM] Very slow queries - please help

2005-11-24 Thread Bealach-na Bo

A quick note to say that I'm very grateful for Tom Lane's input also.
Tom, I did put you on the list of recipients for my last posting to
pgsql-performance, but got:


cut here
This is an automatically generated Delivery Status Notification.

Delivery to the following recipients failed.

  [EMAIL PROTECTED]



Many regards,

Bealach



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Hardware/OS recommendations for large databases (

2005-11-24 Thread Mark Kirkwood

Tom Lane wrote:

Greg Stark [EMAIL PROTECTED] writes:


Last I heard the reason count(*) was so expensive was because its state
variable was a bigint. That means it doesn't fit in a Datum and has to be
alloced and stored as a pointer. And because of the Aggregate API that means
it has to be allocated and freed for every tuple processed.



There's a hack in 8.1 to avoid the palloc overhead (courtesy of Neil
Conway IIRC).



It certainly makes quite a difference as I measure it:

doing select(1) from a 181000 page table (completely uncached) on my PIII:

8.0 : 32 s
8.1 : 25 s

Note that the 'fastcount()' function takes 21 s in both cases - so all 
the improvement seems to be from the count overhead reduction.


Cheers

Mark








---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Pailloncy Jean-Gerard

I redo the test, with a freshly installed data directory. Same result.

Note: This is the full log. I just suppress the mistake I do like  
sl for ls.


Jean-Gérard Pailloncy


Last login: Thu Nov 24 12:52:32 2005 from 192.168.0.1
OpenBSD 3.8 (WDT) #2: Tue Nov  8 00:52:38 CET 2005

Welcome to OpenBSD: The proactively secure Unix-like operating system.

Please use the sendbug(1) utility to report bugs in the system.
Before reporting a bug, please try to reproduce it with the latest
version of the code.  With bug reports, please try to ensure that
enough information to reproduce the problem is enclosed, and if a
known fix for it exists, include that as well.

Terminal type? [xterm-color]
# cd /mnt2/pg/install/bin/
# mkdir /mnt2/pg/data
# chown -R _pgsql:_pgsql /mnt2/pg/data
# su _pgsql
$ ls
clusterdbdroplang pg_configpg_resetxlog  
reindexdb
createdb dropuser pg_controldata   pg_restore
vacuumdb

createlang   ecpg pg_ctl   postgres
createuser   initdb   pg_dump  postmaster
dropdb   ipcclean pg_dumpall   psql
$ ./initdb -D /mnt2/pg/data
The files belonging to this database system will be owned by user  
_pgsql.

This user must also own the server process.

The database cluster will be initialized with locale C.

fixing permissions on existing directory /mnt2/pg/data ... ok
creating directory /mnt2/pg/data/global ... ok
creating directory /mnt2/pg/data/pg_xlog ... ok
creating directory /mnt2/pg/data/pg_xlog/archive_status ... ok
creating directory /mnt2/pg/data/pg_clog ... ok
creating directory /mnt2/pg/data/pg_subtrans ... ok
creating directory /mnt2/pg/data/pg_twophase ... ok
creating directory /mnt2/pg/data/pg_multixact/members ... ok
creating directory /mnt2/pg/data/pg_multixact/offsets ... ok
creating directory /mnt2/pg/data/base ... ok
creating directory /mnt2/pg/data/base/1 ... ok
creating directory /mnt2/pg/data/pg_tblspc ... ok
selecting default max_connections ... 100
selecting default shared_buffers ... 1000
creating configuration files ... ok
creating template1 database in /mnt2/pg/data/base/1 ... ok
initializing pg_authid ... ok
enabling unlimited row size for system tables ... ok
initializing dependencies ... ok
creating system views ... ok
loading pg_description ... ok
creating conversions ... ok
setting privileges on built-in objects ... ok
creating information schema ... ok
vacuuming database template1 ... ok
copying template1 to template0 ... ok
copying template1 to postgres ... ok

WARNING: enabling trust authentication for local connections
You can change this by editing pg_hba.conf or using the -A option the
next time you run initdb.

Success. You can now start the database server using:

./postmaster -D /mnt2/pg/data
or
./pg_ctl -D /mnt2/pg/data -l logfile start

$ ./pg_ctl -D /mnt2/pg/data -l /mnt2/pg/data/logfile start
postmaster starting
$ ./psql postgres
Welcome to psql 8.1.0, the PostgreSQL interactive terminal.

Type:  \copyright for distribution terms
   \h for help with SQL commands
   \? for help with psql commands
   \g or terminate with semicolon to execute query
   \q to quit

postgres=#  create table test (id serial, val integer);
NOTICE:  CREATE TABLE will create implicit sequence test_id_seq for  
serial column test.id

CREATE TABLE
postgres=# create unique index testid on test (id);
CREATE INDEX
postgres=# create index testval on test (val);
CREATE INDEX
postgres=# insert into test (val) values (round(random() 
*1024*1024*1024));

INSERT 0 1
postgres=# vacuum full analyze;
VACUUM
postgres=# select count(1) from test;
count
---
 1
(1 row)

postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

  QUERY PLAN
--
Aggregate  (cost=1.04..1.05 rows=1 width=0)
   -  Unique  (cost=1.02..1.03 rows=1 width=8)
 -  Sort  (cost=1.02..1.02 rows=1 width=8)
   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..1.01 rows=1 width=8)
(5 rows)

postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 1
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 2
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 4
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 8
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 16
postgres=# insert into test (val) select round(random() 
*1024*1024*1024) from test;

INSERT 0 32
postgres=# vacuum full analyze;
VACUUM
postgres=# explain select count(*) from (select distinct on (val) *  
from test) as foo;

  QUERY PLAN
---
Aggregate  (cost=4.68..4.69 

Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Tom Lane
Pailloncy Jean-Gerard [EMAIL PROTECTED] writes:
 I redo the test, with a freshly installed data directory. Same result.

What same result?  You only ran it up to 2K rows, not 2M.  In any
case, EXPLAIN without ANALYZE is pretty poor ammunition for complaining
that the planner made the wrong choice.  I ran the same test case,
and AFAICS the indexscan is the right choice at 2K rows:

regression=# explain analyze select count(*) from (select distinct on (val) *   
from test) as foo;
QUERY PLAN  
  
--
 Aggregate  (cost=105.24..105.25 rows=1 width=0) (actual time=41.561..41.565 
rows=1 loops=1)
   -  Unique  (cost=0.00..79.63 rows=2048 width=8) (actual time=0.059..32.459 
rows=2048 loops=1)
 -  Index Scan using testval on test  (cost=0.00..74.51 rows=2048 
width=8) (actual time=0.049..13.197 rows=2048 loops=1)
 Total runtime: 41.683 ms
(4 rows)

regression=# set enable_indexscan TO 0;
SET
regression=# explain analyze select count(*) from (select distinct on (val) *   
from test) as foo;
  QUERY PLAN
   
---
 Aggregate  (cost=179.96..179.97 rows=1 width=0) (actual time=59.567..59.571 
rows=1 loops=1)
   -  Unique  (cost=144.12..154.36 rows=2048 width=8) (actual 
time=21.438..50.434 rows=2048 loops=1)
 -  Sort  (cost=144.12..149.24 rows=2048 width=8) (actual 
time=21.425..30.589 rows=2048 loops=1)
   Sort Key: test.val
   -  Seq Scan on test  (cost=0.00..31.48 rows=2048 width=8) 
(actual time=0.014..9.902 rows=2048 loops=1)
 Total runtime: 60.265 ms
(6 rows)


regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] 8.1 count(*) distinct: IndexScan/SeqScan

2005-11-24 Thread Kyle Cordes

Tom Lane wrote:


What same result?  You only ran it up to 2K rows, not 2M.  In any
case, EXPLAIN without ANALYZE is pretty poor ammunition for complaining
that the planner made the wrong choice.  I ran the same 



Hello, sorry to jump in mid-stream, but this reminded me of something.

I have hit cases where I have a query for which there is a somewhat 
obvious (to a human...) query plan that should make it possible to get 
a query answer pretty quickly.  Yet the query never finishes (or 
rather, after hours of waiting I finally kill it).  I assume this is 
because of a sub-optimal query plan.  But, it appears that an EXPLAIN 
ANALYZE runs the actual query, so it takes as long as the actual query.


In such a case, how can I go about tracking down the issue, up to an 
including a complaint about the query planner?   :-)


(Overall, I'm pretty pleased with the PG query planner; it often gets 
better results than another, popular commercial DBMS we use here 
that is just a general impression, not the result of setting up the same 
schema in each for a comparison.)


Kyle Cordes
www.kylecordes.com



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] xlog flush request error

2005-11-24 Thread Vipul . Gupta

Hi tom,

basically when i run any query with database say,

select count(*) from table1;

It gives me the following error trace: 
WARNING: could not write block 297776 of 1663/2110743/2110807
DETAIL: Multiple failures --- write error may be permanent.
ERROR: xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324
writing block 297776 of relation 1663/2110743/2110807
xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324
xlog flush request 7/7D02338C is not satisfied --- flushed only to 3/2471E324\q

i tried using pg_resetxlog but till date, have not been able to solve this problem 

Regards,
Vipul Gupta







Tom Lane [EMAIL PROTECTED]
11/24/2005 09:07 PM


To:[EMAIL PROTECTED]
cc:pgsql-performance@postgresql.org
Subject:Re: [PERFORM] xlog flush request error


[EMAIL PROTECTED] writes:
 Can anyone suggest how do i fix this

 xlog flush request 7/7D02338C is not satisfied --- flushed only to 
 3/2471E324

This looks like corrupt data to me --- specifically, garbage in the LSN
field of a page header. Is that all you get? PG 7.4 and up should tell
you the problem page number in a CONTEXT: line.

  regards, tom lane





Re: [PERFORM] xlog flush request error

2005-11-24 Thread Tom Lane
[EMAIL PROTECTED] writes:
 ERROR:  xlog flush request 7/7D02338C is not satisfied --- flushed only to 
 3/2471E324
  writing block 297776 of relation 1663/2110743/2110807

You need to fix or zero out that data block ...

regards, tom lane

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly