[PERFORM] Horribly slow query/ sequential scan

2007-01-09 Thread Gregory S. Williamson
This is a query migrated from postgres. In postgres it runs about 10,000 times 
*slower* than on informix on somewhat newer hardware. The problem is entirely 
due to the planner. This  PostgreSQL 8.1.4 on linux, 2 gigs of ram.

The table:
  Table reporting.bill_rpt_work
Column | Type  | Modifiers
---+---+---
 report_id | integer   |
 client_id | character varying(10) |
 contract_id   | integer   | not null
 rate  | numeric   | not null
 appid | character varying(10) | not null
 userid| text  | not null
 collection_id | integer   | not null
 client_name   | character varying(60) |
 use_sius  | integer   | not null
 is_subscribed | integer   | not null
 hits  | numeric   | not null
 sius  | numeric   | not null
 total_amnt| numeric   | not null
 royalty_total | numeric   |
Indexes:
billrptw_ndx UNIQUE, btree (report_id, client_id, contract_id, rate, 
appid, userid, collection_id)
billrpt_cntrct_ndx btree (report_id, contract_id, client_id)
billrpt_collid_ndx btree (report_id, collection_id, client_id, 
contract_id)
Foreign-key constraints:
$1 FOREIGN KEY (report_id) REFERENCES billing_reports(report_id)
$2 FOREIGN KEY (client_id) REFERENCES work.clients(client_id)


The query:
explain analyze select
w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
sum(w.sius) * w.rate AS BYIUS
from bill_rpt_work w, billing_reports b
where w.report_id in
(select b.report_id from billing_reports where b.report_s_date = '2006-09-30')
and (w.client_id = '22741' or w.client_id = '227410')
group by 1,2,3
order by 1,2,3;
  QUERY PLAN

--
 GroupAggregate  (cost=298061335.44..298259321.62 rows=26 width=58) (actual 
time=372213.673..372213.689 rows=2 loops=1)
   -  Sort  (cost=298061335.44..29808.83 rows=8799354 width=58) (actual 
time=372213.489..372213.503 rows=37 loops=1)
 Sort Key: w.appid, w.rate, w.is_subscribed
 -  Nested Loop  (cost=0.00..296121313.45 rows=8799354 width=58) 
(actual time=286628.486..372213.053 rows=37 loops=1)
   Join Filter: (subplan)
   -  Seq Scan on bill_rpt_work w  (cost=0.00..85703.20 rows=11238 
width=62) (actual time=1.239..1736.746 rows=61020 loops=1)
 Filter: (((client_id)::text = '22741'::text) OR 
((client_id)::text = '227410'::text))
   -  Seq Scan on billing_reports b  (cost=0.00..29.66 rows=1566 
width=8) (actual time=0.001..0.879 rows=1566 loops=61020)
   SubPlan
 -  Result  (cost=0.00..29.66 rows=1566 width=0) (actual 
time=0.000..0.002 rows=1 loops=95557320)
   One-Time Filter: ($1 = '2006-09-30'::date)
   -  Seq Scan on billing_reports  (cost=0.00..29.66 
rows=1566 width=0) (actual time=0.001..0.863 rows=1565 loops=61020)
 Total runtime: 372214.085 ms


Informix uses report id/client id as an index, thus eliminating a huge number 
of rows. The table has 2280545 rows currently; slightly fewer when the above 
analyze was run. Informix has about 5 times as much data.

select count(*) from bill_rpt_work where report_id in (select report_id from 
billing_reports where report_s_date = '2006-09-30') and (client_id = 
'22741' or client_id = '227410');
 count
---
37
(1 row)

So scanning everything seems particularly senseless.

I had some success adding client id and report id to the initial select list, 
but that causes all sorts of problems in calling procedures that expect 
different data grouping.

Any suggestion would be welcome because this is a horrible show stopper.

Thanks,

Greg Williamson
DBA
GlobeXplorer LLC




---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Horribly slow query/ sequential scan

2007-01-09 Thread Gregory S. Williamson
Voila ! You da man !

 other expressions of awe and appreciation ...

HAving burdened others with my foolishness too often, I hesitate to ask, but 
could someone either point me to a reference or explain what the difference 
might be ... I can see it with the eyes but I am having trouble understanding 
what Informix might have been doing to my (bad ?) SQL to fix the query. 
Seeing a redundancy and eliminating it ?

The explain analyze for db's sql (slightly faster than Informix on an older 
Sun machine ... about 20%):
 GroupAggregate  (cost=64.35..64.75 rows=8 width=58) (actual time=0.612..0.629 
rows=2 loops=1)
   -  Sort  (cost=64.35..64.37 rows=8 width=58) (actual time=0.463..0.476 
rows=37 loops=1)
 Sort Key: w.appid, w.rate, w.is_subscribed
 -  Nested Loop  (cost=8.11..64.23 rows=8 width=58) (actual 
time=0.130..0.211 rows=37 loops=1)
   Join Filter: (inner.report_id = outer.report_id)
   -  HashAggregate  (cost=3.95..3.96 rows=1 width=4) (actual 
time=0.035..0.035 rows=1 loops=1)
 -  Index Scan using billrpt_sdate_ndx on billing_reports 
b  (cost=0.00..3.94 rows=1 width=4) (actual time=0.021..0.023 rows=1 loops=1)
   Index Cond: (report_s_date = '2006-09-30'::date)
   -  Bitmap Heap Scan on bill_rpt_work w  (cost=4.17..59.92 
rows=28 width=62) (actual time=0.084..0.111 rows=37 loops=1)
 Recheck Cond: (((w.report_id = outer.report_id) AND 
((w.client_id)::text = '22741'::text)) OR ((w.report_id = 
outer.report_id) AND ((w.client_id)::text = '227410'::text)))
 -  BitmapOr  (cost=4.17..4.17 rows=28 width=0) (actual 
time=0.078..0.078 rows=0 loops=1)
   -  Bitmap Index Scan on billrptw_ndx  
(cost=0.00..2.08 rows=14 width=0) (actual time=0.053..0.053 rows=22 loops=1)
 Index Cond: ((w.report_id = outer.report_id) 
AND ((w.client_id)::text = '22741'::text))
   -  Bitmap Index Scan on billrptw_ndx  
(cost=0.00..2.08 rows=14 width=0) (actual time=0.024..0.024 rows=15 loops=1)
 Index Cond: ((w.report_id = outer.report_id) 
AND ((w.client_id)::text = '227410'::text))
 Total runtime: 6.110 ms
(16 rows)

Thanks again (and sorry for the top-posting but this particular interface is 
ungainly)

G

-Original Message-
From:   [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent:   Tue 1/9/2007 4:35 AM
To: Gregory S. Williamson
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] Horribly slow query/ sequential scan

I don't think I understand the idea behind this query. Do you really need
billing_reports twice?

 The query:
 explain analyze select
 w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
 sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
 sum(w.sius) * w.rate AS BYIUS
 from bill_rpt_work w, billing_reports b
 where w.report_id in
 (select b.report_id from billing_reports where b.report_s_date =
 '2006-09-30')
 and (w.client_id = '22741' or w.client_id = '227410')
 group by 1,2,3
 order by 1,2,3;

Maybe this is the query you want instead?

select w.appid,
   w.rate,
   w.is_subscribed,
   sum(w.hits) AS Hits,
   sum(w.sius) AS IUs,
   sum(w.total_amnt) AS Total,
   sum(w.hits) * w.rate AS ByHits,
   sum(w.sius) * w.rate AS BYIUS
  from bill_rpt_work w
 where w.report_id in
   (select b.report_id from billing_reports b where b.report_s_date =
'2006-09-30')
   and (w.client_id = '22741' or w.client_id = '227410')
group by 1,2,3
order by 1,2,3;

/Dennis



---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38b1548991076418835[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:45a38b1548991076418835!
---




Voi

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Horribly slow query/ sequential scan

2007-01-09 Thread Gregory S. Williamson
Thanks for the suggestion ... I will try it when I've had some sleep and the 
server is quiet again ... the IN seems to have improved markedly since the 7.4 
release, as advertised, so I will be interested in trying this.

GSW

-Original Message-
From:   Nörder-Tuitje, Marcus [mailto:[EMAIL PROTECTED]
Sent:   Tue 1/9/2007 4:50 AM
To: [EMAIL PROTECTED]; Gregory S. Williamson
Cc: pgsql-performance@postgresql.org
Subject:AW: [PERFORM] Horribly slow query/ sequential scan

Forget abount IN. Its horribly slow.

try :

select w.appid,
   w.rate,
   w.is_subscribed,
   sum(w.hits) AS Hits,
   sum(w.sius) AS IUs,
   sum(w.total_amnt) AS Total,
   sum(w.hits) * w.rate AS ByHits,
   sum(w.sius) * w.rate AS BYIUS
  from bill_rpt_work w
 where (select b.report_id from billing_reports b where b.report_s_date = 
'2006-09-30' and w.report_id = b.report_id)
   and w.client_id IN ('22741','227410')
group by 1,2,3
order by 1,2,3;



should by faster; 

assuming : index on report_id in b; index on report_id, client_id in w

to enforce useage of indexes on grouping (depends on result size), consider 
extending w with cols 1,2,3.


regards, 
marcus

-Ursprüngliche Nachricht-
Von: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Auftrag von
[EMAIL PROTECTED]
Gesendet: Dienstag, 9. Januar 2007 13:36
An: Gregory S. Williamson
Cc: pgsql-performance@postgresql.org
Betreff: Re: [PERFORM] Horribly slow query/ sequential scan


I don't think I understand the idea behind this query. Do you really need
billing_reports twice?

 The query:
 explain analyze select
 w.appid,w.rate,w.is_subscribed,sum(w.hits) AS Hits ,sum(w.sius) AS IUs,
 sum(w.total_amnt) AS Total,sum(w.hits) * w.rate AS ByHits,
 sum(w.sius) * w.rate AS BYIUS
 from bill_rpt_work w, billing_reports b
 where w.report_id in
 (select b.report_id from billing_reports where b.report_s_date =
 '2006-09-30')
 and (w.client_id = '22741' or w.client_id = '227410')
 group by 1,2,3
 order by 1,2,3;

Maybe this is the query you want instead?

select w.appid,
   w.rate,
   w.is_subscribed,
   sum(w.hits) AS Hits,
   sum(w.sius) AS IUs,
   sum(w.total_amnt) AS Total,
   sum(w.hits) * w.rate AS ByHits,
   sum(w.sius) * w.rate AS BYIUS
  from bill_rpt_work w
 where w.report_id in
   (select b.report_id from billing_reports b where b.report_s_date =
'2006-09-30')
   and (w.client_id = '22741' or w.client_id = '227410')
group by 1,2,3
order by 1,2,3;

/Dennis


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate




---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45a38ea050372117817174[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:45a38ea050372117817174!
---






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] New to PostgreSQL, performance considerations

2006-12-13 Thread Gregory S. Williamson
(Re)-Design it to do both, unless there's reason to believe that doing one 
after the other would skew the results.

Then old results are available, new results are also visible and useful for 
future comparisons. And seeing them side by side mught be an interesting 
exercise as well, at least for a while.

(sorry for top-posting -- web based interface that doesn't do proper quoting)

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   [EMAIL PROTECTED] on behalf of Michael Glaesemann
Sent:   Wed 12/13/2006 10:11 PM
To: Tom Lane
Cc: Joshua D. Drake; Josh Berkus; pgsql-performance@postgresql.org; Bruce 
Momjian; Alvaro Herrera; Alexander Staubo; Michael Stone
Subject:Re: [PERFORM] New to PostgreSQL, performance considerations 


On Dec 14, 2006, at 14:44 , Tom Lane wrote:

 The pgbench app itself becomes the bottleneck at high transaction
 rates.  Awhile back I rewrote it to improve its ability to issue
 commands concurrently, but then desisted from submitting the
 changes --- if we change the app like that, future numbers would
 be incomparable to past ones, which sort of defeats the purpose of a
 benchmark no?

At the same time, if the current pgbench isn't the tool we want to  
use, is this kind of backward comparison going to hinder any move to  
improve it? It sounds like there's quite a bit of room for  
improvement in pg_bench, and in my opinion we should move forward to  
make an improved tool, one that measures what we want to measure. And  
while comparison with past results might not be possible, there  
remains the possibility of rerunning the improved pgbench on previous  
systems, I should think.

Michael Glaesemann
grzm seespotcode net



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=4580ea76236074356172766[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:4580ea76236074356172766!
---






---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Hardware advice

2006-12-06 Thread Gregory S. Williamson
If your data is valuable I'd recommend against RAID5 ... see 
http://www.miracleas.com/BAARF/RAID5_versus_RAID10.txt

performance aside, I'd advise against RAID5 in almost all circumstances. Why 
take chances ?

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   [EMAIL PROTECTED] on behalf of Sven Geisler
Sent:   Wed 12/6/2006 1:09 AM
To: Alex Turner
Cc: Alexandru Coseru; pgsql-performance@postgresql.org
Subject:Re: [PERFORM] Hardware advice

Hi Alex,

Please check out http://www.powerpostgresql.com/PerfList before you
use RAID 5 for PostgreSQL.

Anyhow, In a larger scale you end up in the response time of the I/O
system for an read or write. The read is in modern RAID and SAN
environments the part where you have to focus when you want to tune your
system because most RAID and SAN system can buffer write.
PostgreSQL does use the Linux file system cache which is normally much
larger then the RAID or SAN cache for reading. This means whenever a
PostgreSQL read goes to the RAID or SAN sub system the response time of
the hard disk will become interesting.
I guess you can imagine that multiple reads to the same spins are
causing an delay in the response time.


Alexandru,

You should have two XEONs, what every your core count is.
This would use the full benefit of the memory architecture.
You know two FSBs and two memory channels.

Cheers
Sven

Alex Turner schrieb:
 The test that I did - which was somewhat limited, showed no benefit
 splitting disks into seperate partitions for large bulk loads.
 
 The program read from one very large file and wrote the input out to two
 other large files.
 
 The totaly throughput on a single partition was close to the maximum
 theoretical for that logical drive, even though the process was reading
 and writing to three seperate places on the disk.  I don't know what
 this means for postgresql setups directly, but I would postulate that
 the benefit from splitting pg_xlog onto a seperate spindle is not as
 great as it might once have been for large bulk transactions.  I am
 therefore going to be going to a single 6 drive RAID 5 for my data
 wharehouse application because I want the read speed to be availalbe.  I
 can benefit from fast reads when I want to do large data scans at the
 expense of slightly slower insert speed.
 
 Alex.
 
 On 12/5/06, *Alexandru Coseru* [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED] wrote:
 
 Hello..
 
 Thanks for the advices..
 
 Actually , i'm waiting for the clovertown  to show up on the market...
 
 Regards
 Alex
 
 - Original Message -
 From: Sven Geisler [EMAIL PROTECTED] mailto:[EMAIL PROTECTED]
 To: Alexandru Coseru  [EMAIL PROTECTED]
 mailto:[EMAIL PROTECTED]
 Cc: pgsql-performance@postgresql.org
 mailto:pgsql-performance@postgresql.org
 Sent: Tuesday, December 05, 2006 11:57 AM
 Subject: Re: [PERFORM] Hardware advice
 
 
  Hi Alexandru,
 
  Alexandru Coseru schrieb:
  [...]
  Question 1:
 The RAID layout should be:
 a)  2 hdd in raid 1 for system and pg_xlog  and 6 hdd in
  raid10 for data ?
 b)  8 hdd in raid10  for all ?
 c)  2 hdd in raid1 for system  , 2 hdd in raid1 for
 pg_xlog ,
  4 hdd in raid10 for data ?
 Obs: I'm going for setup a)  , but i want to hear your
 thoughts as
  well.
 
  This depends on you data size. I think, option a and c are good.
  The potential bottleneck may the RAID 1 for pg_xlog if you have huge
  amount of updates and insert.
  What is about another setup
 
  4 hdd in RAID 10 for System and pg_xlog - System partitions are
 normally
  not in heavy use and pg_xlog should be fast for writing.
  4 hdd in RAID 10 for data.
 
 
 
  Question 2:  (Don't want to start a flame here. but here is goes)
 What filesystem should i run for data ?  ext3 or xfs ?
 The tables have ~ 15.000 rel_pages each.  The biggest
 table has
  now over 30.000 pages.
 
  We have a database running with 60,000+ tables. The tables size is
  between a few kByte for the small tables and up to 30 GB for the
 largest
  one. We had no issue with ext3 in the past.
 
 
  Question 3:
 The block size in postgresql is 8kb.  The strip size
 in the
  raid ctrl is 64k.
  Should i increase the pgsql block size to 16 or 32 or
 even 64k ?
 
  You should keep in mind that the file system has also a block
 size. Ext3
  has as maximum 4k.
  I would set up the partitions aligned to the stripe size to prevent
  unaligned reads. I guess, you can imagine that a larger block size of
  postgresql may also end up in unaligned reads because the file system
  has a smaller block size.
 
  RAID Volume and File system set up
  1. Make all partitions aligned to 

Re: [PERFORM] Slow SELECT on three or more clients

2006-11-15 Thread Gregory S. Williamson
Operating system and some of the basic PostreSQL config settings would be 
helpful, plus any info you have on your disks, the size of the relevant tables, 
their structure and indexes  vacuum/analyze status ... plus what others have 
said:

Upgrade!

There are considerable improvements in, well, *everything* !, since 7.3 (we 
havew some database atb 7.4.x and I consider them out-of-date). Hopefully this 
list can provide help to get you through whatever your immediate crisis is, but 
do consider planning for this as soon as time and resource permit.

Data integrity is a _good_ thing!

Greg Williamson
DBA
GlobeXplorer LLC


-Original Message-
From:   [EMAIL PROTECTED] on behalf of Russell Smith
Sent:   Wed 11/15/2006 5:31 AM
To: AMIR FRANCO D. JOVEN
Cc: pgsql-performance@postgresql.org
Subject:Re: [PERFORM] Slow SELECT on three or more clients

AMIR FRANCO D. JOVEN wrote:
 Hi!

 Im new to PostgreSQL.

 My current project uses PostgreSQL 7.3.4.
Upgrading your version of PostgreSQL to 8.1 will give you significant 
benefits to performance.

 the problem is like this:

 I have a table with 94 fields and a select with only one resultset in 
 only one client consumes about 0.86 seconds.
 The client executes three 'select' statements to perform the task 
 which consumes 2.58 seconds.
 With only one client this is acceptable, but the real problem is as i 
 add more clients, it goes more and more slower.

 for a single select with one field in one resultset, is 0.86 seconds 
 normal?
You will need to attach the query.
EXPLAIN ANALYZE SELECT ...

where SELECT ... is your query.  That will help us work out what the 
problem is. 

0.86 seconds might be slow for a query that returns 1 row, it might be 
fast for a query that returns a large set with complex joins and where 
conditions.  Fast and slow are not objective terms.  They are very 
dependent on the query.


 I tried vacuuming and reindexing but to no avail.
 the total record count in that particular table is 456,541.

456,541 is not all that many records.  But again you will need to post 
more information for us to be able to assist.
 Thanks in advance.



---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=455b17b2223071076418835[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:455b17b2223071076418835!
---






---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Context switch storm

2006-11-03 Thread Gregory S. Williamson
Based on what other people have posted, hyperthreading seems not to be 
beneficial for postgres -- try searching through the archives of this list. 
(And then turn it off and see if it helps.)

You might also post a few details:

config settings (shared_buffers, work_mem, maintenance_work_mem, wal and 
checkpoint settings, etc.)

are you using autovacuum ?

all tables are vacuumed and analyzed regularly ? How big are they ? Do they and 
indexes fit in RAM ?

any particular queries that running and might be related (explain analyze 
results of them would be useful)

disk configuration

Other processes on this box ?

# of connections to it (I've seen this alone push servers over the edge)

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   [EMAIL PROTECTED] on behalf of [EMAIL PROTECTED]
Sent:   Fri 11/3/2006 2:32 AM
To: pgsql-performance@postgresql.org
Cc: 
Subject:[PERFORM] Context switch storm

Hi,
 
We've migrated one of our servers from pg 7.4 to 8.1 and from times to times (4 
hours) the server start doing a lot of context switching and all transactions 
become very slow.
 
The average context switching for this server as vmstat shows is 1 but when the 
problem occurs it goes to 25.
 
CPU and memory usage are ok.
 
What is producing this context switching storms?
 
It is a box with 16GB RAM and 4 XEON processors running RedHat Enterprise Linux 
AS.
 
Should I disable Hyperthreading?
 
Thank you in advance!
 
Reimer



---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=454b34ac206028992556831[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:454b34ac206028992556831!
---




---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Optimization of this SQL sentence

2006-10-17 Thread Gregory S. Williamson
Off hanbd I can't recommend anything, bur perhaps you could post the details of 
the tables (columns, indexes),and some info on what version of postgres you are 
using.

Are the tables recently analyzed ? How many rows in them ?

Greg Williamson
DBA
GlobeXplorer LLC


-Original Message-
From:   [EMAIL PROTECTED] on behalf of Ruben Rubio
Sent:   Tue 10/17/2006 2:05 AM
To: pgsql-performance@postgresql.org
Cc: 
Subject:[PERFORM] Optimization of this SQL sentence

-BEGIN PGP SIGNED MESSAGE-
Hash: SHA1

This SQL sentence is very simple. I need to get better results. I have
tried some posibilities and I didn't get good results.

SELECT max(idcomment)
 FROM ficha vf
 INNER JOIN comment c ON (vf.idficha=c.idfile AND (idestado=3 OR
idestado=4))
 WHERE idstatus=3
 AND ctype=1


QUERY PLAN

Aggregate  (cost=2730.75..2730.76 rows=1 width=4) (actual
time=188.463..188.469 rows=1 loops=1)

  -  Hash Join  (cost=1403.44..2730.72 rows=11 width=4) (actual
time=141.464..185.404 rows=513 loops=1)

Hash Cond: (outer.idfile = inner.idficha)

-  Seq Scan on comment c  (cost=0.00..1321.75 rows=1083
width=8) (actual time=0.291..36.112 rows=642 loops=1)

  Filter: ((idstatus = 3) AND (ctype = 1))

-  Hash  (cost=1403.00..1403.00 rows=178 width=4) (actual
time=141.004..141.004 rows=6282 loops=1)

  -  Seq Scan on ficha vf  (cost=0.00..1403.00 rows=178
width=4) (actual time=0.071..97.885 rows=6282 loops=1)

Filter: (((idestado)::text = '3'::text) OR
((idestado)::text = '4'::text))

Total runtime: 188.809 ms


Thanks in advance,
Ruben Rubio
-BEGIN PGP SIGNATURE-
Version: GnuPG v1.4.2.2 (GNU/Linux)
Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org

iD4DBQFFNJzfIo1XmbAXRboRAgPRAJ99+S9wL21b+JN14bQbAoREFXYUcQCYpfEZ
p1MCcDMWqTxzSdtssUFWOw==
=rUHB
-END PGP SIGNATURE-

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


---
Click link below if it is SPAM [EMAIL PROTECTED]
https://mailscanner.globexplorer.com/dspam/dspam.cgi?signatureID=45349c86275246672479766[EMAIL
 PROTECTED]retrain=spamtemplate=historyhistory_page=1
!DSPAM:45349c86275246672479766!
---






---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Is postgresql ca do the job for software deployed in

2006-07-05 Thread Gregory S. Williamson
A sodden late night idea ... schemas don't need to have names that are 
meaningful to outsiders.

Still, the point about political aspects is an important one. OTH, schemas 
provide an elegant way of segregating data.

My $0.02 (not worth what it was)

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   [EMAIL PROTECTED] on behalf of Markus Schaber
Sent:   Wed 7/5/2006 3:38 AM
To: pgsql-performance@postgresql.org
Cc: 
Subject:Re: [PERFORM] Is postgresql ca do the job for software deployed 
in

Hi, Mikael,

Just my 2 cents:

Mikael Carneholm wrote:
 Do you really need to create one *DB* per client - that is, is one
 schema (in the same DB) per client out of the question?

Sometimes, schemas would work _technically_, but not politically, as a
postgresql user cannot be prevented from listing all schemas (or even
all databases in the same user), regardless whether he/she has access
rights.

But it is not always acceptable that a customer knows which other
customers one has.

This forces the use of the one cluster per customer paradigm.


Thanks,
Markus
-- 
Markus Schaber | Logical TrackingTracing International AG
Dipl. Inf. | Software Development GIS

Fight against software patents in EU! www.ffii.org www.nosoftwarepatents.org

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org

!DSPAM:44ab96fb98231804284693!





---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal

2006-05-18 Thread Gregory S. Williamson
1.451 ms = 1.451 milliseconds
1451.0 ms = 1.451 seconds ...

so 32.918 ms for a commit seems perhaps reasonable ?

Greg Williamson
DBA
GlobeXplorer LLC



-Original Message-
From:   [EMAIL PROTECTED] on behalf of Zeugswetter Andreas DCP SD
Sent:   Thu 5/11/2006 12:55 AM
To: Jim C. Nasby; PFC
Cc: Greg Stark; Tom Lane; pgsql-performance@postgresql.org; 
pgsql-hackers@postgresql.org
Subject:Re: [PERFORM] [HACKERS] Big IN() clauses etc : feature proposal


 Something else worth considering is not using the normal 
 catalog methods
 for storing information about temp tables, but hacking that together
 would probably be a rather large task.

But the timings suggest, that it cannot be the catalogs in the worst
case
he showed.

 0.101 ms BEGIN
 1.451 ms CREATE TEMPORARY TABLE tmp ( a INTEGER NOT NULL, b INTEGER
NOT  
 NULL, c TIMESTAMP NOT NULL, d INTEGER NOT NULL ) ON COMMIT DROP

1.4 seconds is not great for create table, is that what we expect ?

 0.450 ms INSERT INTO tmp SELECT * FROM bookmarks ORDER BY annonce_id
DESC  
 LIMIT 20
 0.443 ms ANALYZE tmp
 0.365 ms SELECT * FROM tmp
 0.310 ms DROP TABLE tmp
 32.918 ms COMMIT
 
   CREATING the table is OK, but what happens on COMMIT ? I hear
the disk  
 seeking frantically.

The 32 seconds for commit can hardly be catalog related. It seems the
file is 
fsynced before it is dropped.

Andreas

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq

!DSPAM:446c0a75172664042098162!





---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Benchmarck PostgreSQL 8.1.4 MySQL 5.0.20 and Oracle 10g2

2006-05-18 Thread Gregory S. Williamson

That fsync off would make me very unhappy in a production environment  not 
that turning it on would help postgres, but ... one advantage of postgres is 
its reliability under a pull the plug scenario, but this setting defeats that.

FWIW, Xeon has gotten quite negative reviews in these quarters (Opteron seems 
to do way better), IIRC, and I know we've had issues with Dell's disk i/o, 
admittedly on a different box.

Quite interesting results, even if a bit disappointing to a (newly minted) fan 
of postgres. I'll be quite interested to hear more. Thanks for the work, 
although it seems like some of it won;t be able to released, unless Oracle has 
given some new blessing to releasing benchmark results.

Greg Williamson
DBA
GlobeXplorer LLC
-Original Message-
From:   [EMAIL PROTECTED] on behalf of Olivier Andreotti
Sent:   Thu 5/18/2006 2:57 AM
To: pgsql-performance@postgresql.org
Cc: 
Subject:[PERFORM] Benchmarck PostgreSQL 8.1.4 MySQL 5.0.20 and Oracle 
10g2

Hello,

I'm running a benchmark with theses 3 databases, and the first results
are not very good for PostgreSQL.

PostgreSQL is 20% less performance than MySQL (InnoDB tables)

My benchmark uses the same server for theses 3 databases :
Dell Power edge - Xeon 2.8 Ghz - 2 Go Ram - 3 SCSI disks - Debian
Sarge - Linux 2.6

The transactions are a random mix of request in read (select) and
write (insert, delete, update) on many tables about 100 000 to 15 000
000 rows.

Transactions are executed from 500 connections.

For the tunning of PostgreSQL i use official documentation and theses
web sites :

http://www.revsys.com/writings/postgresql-performance.html
http://www.varlena.com/GeneralBits/Tidbits/annotated_conf_e.html


Some important points of my postgresql.conf file :

max_connections = 510
shared_buffer = 16384
max_prepared_transactions = 510
work_mem = 1024
maintenance_work_mem = 1024
fsync = off
wal_buffers = 32
commit_delay = 500
checkpoint_segments = 10
checkpoint_timeout = 300
checkpoint_warning = 0
effective_cache_size = 165 000
autovaccuum = on
default_transaction_isolation = 'read_committed'

What do you think of my tunning ?

Best regards.

O.A

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings

!DSPAM:446c453a198591465223968!





---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] [postgis-users] Is my query planner failing me, or vice versa?

2005-12-14 Thread Gregory S. Williamson

Forgive the cross-posting, but I found myself wondering if might not be some 
way future way of telling the planner that a given table (column ?) has a high 
likelyhood of being TOASTed. Similar to the random_page_cost in spirit. We've 
got a lot of indexed data that is spatial and have some table where no data is 
toasted (road segments) and others where evrything is.

An idle suggestion from one who knows that he is meddling with ;-}

Greg Williamson
DBA
GlobeXplorer LLC

 -Original Message-
 From: [EMAIL PROTECTED]
 [mailto:[EMAIL PROTECTED] Behalf Of
 Jessica M Salmon
 Sent: Wednesday, December 14, 2005 9:09 AM
 To: PostGIS Users Discussion
 Subject: Re: [postgis-users] Is my query planner failing me, or vice versa?
 
 Thanks, Marcus, for explaining.
 
 And thanks, Robert, for asking that question about adjusting page size.
 
 My tuples are definitely toasted (some of my geometries are 30X too big for
 a single page!), so I'm glad I'm aware of the TOAST tables now. I suppose
 there's not much to be done about it, but it's good to know.
 
 Thanks everyone for such an array of insightful help.
 
 -Meghan

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Question

2005-07-11 Thread Gregory S. Williamson
As a sometimes Informix and PostgreSQL DBA, I disagree with the contentions 
below. We have many tables with 10s of millions of rows in Postgres. We have 
had (alas) power issues with our lab on more than one occasion and the 
afflicted servers have recovered like a champ, every time.

This person may not like postgres (or very much likes Informix), but he 
shouldn't conjure up spurious reasons to support his/her prejudice.

Informix is an excellent product, but it can be costly for web related 
applications. PostgeSQL is also an excellent database. Each has differences 
which may make the decision between the two of them clear. But facts are 
necessary to have a real discussion.

Greg WIlliamson
DBA
GlobeXplorer LLC

-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Alejandro
Lemus
Sent: Monday, July 11, 2005 6:00 AM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Question


In the past week, one guy of Unix Group in Colombia
say: Postgrest in production is bat, if the power off
in any time the datas is lost why this datas is in
plain files. Postgrest no ssupport data bases with
more 1 millon of records. 
Wath tell me in this respect?, is more best Informix
as say 

Ing. Alejandro Lemus G.
Radio Taxi Aeropuerto S.A.
Avenida de las Américas # 51 - 39 Bogotá - Colombia
Tel: 571-4470694 / 571-4202600 Ext. 260 Fax: 571-2624070
email: [EMAIL PROTECTED]

__
Correo Yahoo!
Espacio para todos tus mensajes, antivirus y antispam ¡gratis! 
Regístrate ya - http://correo.espanol.yahoo.com/ 

---(end of broadcast)---
TIP 6: explain analyze is your friend

!DSPAM:42d26e2065882109568359!


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Low Performance for big hospital server ..

2005-01-03 Thread Gregory S. Williamson
Amrit --

-Original Message-
From:  [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED]
Sent:  Mon 1/3/2005 12:18 AM
To:Mark Kirkwood
Cc:PGsql-performance
Subject:   Re: [PERFORM] Low Performance for big hospital server ..
 shared_buffers = 12000 will use 12000*8192 bytes (i.e about 96Mb). It is
 shared, so no matter how many connections you have it will only use 96M.

Now I use the figure of 27853

 
 Will the increasing in effective cache size to arround 20 make a little
 bit
 improvement ? Do you think so?
 
Decrease the sort mem too much [8196] make the performance much slower so I 
use
sort_mem = 16384
and leave effective cache to the same value , the result is quite better but I
should wait for tomorrow morning [official hour]  to see the end result.

 
 I would leave it at the figure you proposed (128897), and monitor your
 performance.
 (you can always increase it later and see what the effect is).
Yes , I use this figure.

If the result still poor , putting more ram 6-8Gb [also putting more money
too] will solve the problem ?

Adding RAM will almost always help, at least for a while. Our small runitme 
servers have 2 gigs of RAM; the larger ones have 4 gigs; I do anticipate the 
need to add RAM as we add users.

If you have evaluated the queries that are running and verified that they are 
using indexes properly, etc., and tuned the other parameters for your system 
and its disks, adding memory helps because it increases the chance that data is 
already in memory, thus saving the time to fetch it from disk. Studying 
performance under load with top, vmstat, etc. and detailed analysis of queries 
can often trade some human time for the money that extra hardware would cost. 
Sometimes easier to do than getting downtime for a critical server, as well.

If you don't have a reliable way of reproducing real loads on a test system, it 
is best to change things cautiously, and observe the system under load; if you 
change too many things (ideally only 1 at a time but often that is not 
possible) you mau actually defeat a good change with a bad one; at the least,m 
you may not know which change was the most important one if you make several at 
once.

Best of luck,

Greg Williamson
DBA
GlobeXplorer LLC
Thanks ,
Amrit
Thailand


---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly




---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Improve BULK insertion

2004-12-05 Thread Gregory S. Williamson
Rodrigo --

You should definitely drop the indexes and any other FK constraints before 
loading and then rebuild them. Check your logs and see if there are warnings 
about checkpoint intervals -- only 3 logs seems like it might be small; if you 
have the disk space I would definitely consider raising the number. If you 
haven't already posted your config settings you might do so -- this seems very 
slow. I regularly use COPY to load or unload data sets in the 200k-900k range 
and they don't take 90 minutes, even on slower hardware (and usually only a few 
minutes on our production servers; rebuilding the indexes usually takes longer. 
 

This unloading a 300k+ row data set on a dell linux box with not very good 
disks and 1 gig of RAM:

Starting copy of parcel staging table parcels_12031 at Thu Dec  2 01:13:52 2004
Done with staging table copy at Thu Dec  2 01:15:16 2004
...
Starting compression of parcel file at Thu Dec  2 01:15:22 2004
gzip: /tmp/parcels_12031.unl.gz already exists; do you wish to overwrite (y or n
)? y
Done with compression of parcel file at Thu Dec  2 01:17:23 2004
...

And loading them on a rather faster server:

Starting unzip of parcels at Thu Dec  2 01:29:15 2004
Finished with unzip at Thu Dec  2 01:29:22 2004
...
Target db detail table updated at Thu Dec  2 01:29:29 2004
Dropping indexes
Dropping fk constraint on tracking id
Dropping indexes
Done dropping indexes on target parcels table at Thu Dec  2 01:29:30 2004
NOTICE:  drop cascades to table f12031.parcel_pins
NOTICE:  drop cascades to table f12031.parcel_addresses
NOTICE:  drop cascades to table f12031.parcel_owner_fti
NOTICE:  drop cascades to table f12031.parcel_owners
Removing old parcels entries starting at Thu Dec  2 01:29:30 2004
Done deleting schema and parcels for track_id 10163541 at Thu Dec  2 01:33:04 
2004
Starting load of parcels at Thu Dec  2 01:33:04 2004
Done copying data into parcels at Thu Dec  2 01:35:18 2004
Deleting old v_detail reference for track_id 10163541
Done with delete of old v_detail reference
Starting creation of foreign key constraint at Thu Dec  2 01:39:43 2004
Done with creation of foreign key constraint at Thu Dec  2 01:42:14 2004
Starting spatial index create at Thu Dec  2 01:42:14 2004
Done creating spatial index at Thu Dec  2 01:55:04 2004
Starting stats on geometry column now
Done doing stats for spatial index at Thu Dec  2 02:03:47 2004
Starting index on PIN now
Done creating pin index at Thu Dec  2 02:09:36 2004
Starting index on tracking id now
Done creating trid index at Thu Dec  2 02:12:35 2004
Starting centroid index now
Done creating centroid index at Thu Dec  2 02:24:11 2004
Starting stats on centroid column
Done doing stats for spatial index at Thu Dec  2 02:29:55 2004
Doing City/Street Index on parcels table ...Done creating city/street index at 
Thu Dec  2 02:42:41 2004 with result -1
Committing changes

So this took about 70 minutes to delete 20+ rows from a table with about 5 
million rows, load a new set and reindex them (and do some statistics for 
spatial geometry). If the table had only this data the indexing would have been 
*much* faster. These are moderate size columns -- about 2 dozen columns and 
some spatial data (polygon and point). Both servers have rather more log files 
than your setup, but I am not familiar enough with postgres to know how much of 
an impact that alone will have. The comment about it slowing down part way 
through a load makes me suspect indexing issues, somehow (not from postgres 
experience but it rings a bell with other DBs); if you explicitly drop the 
indexes first and then load does it show the same performance behavior ?

If you are doing the data read from, the database write and the WAL logging all 
on single disk drive, then I would guess that that is your bottleneck. If you 
use vmstat and/or top or the like, is your I/O pegged ?

HTH

Greg WIlliamson
DBA
GlobeXplorer LLC

-Original Message-
From:   Rodrigo Carvalhaes [mailto:[EMAIL PROTECTED]
Sent:   Sun 12/5/2004 11:52 AM
To: Christopher Browne
Cc: [EMAIL PROTECTED]
Subject:Re: [PERFORM] Improve BULK insertion
Hi!

1. I am doing the inserts using pg_restore. The dump was created using 
pg_dump and the standard format (copy statements)
2. See below the table schema. There are only 7 indexes.  
3. My transaction log configuration are : checkpoint_segments = 3  and 
checkpoint_timeout = 300 and my transaction logs are on the same disk .

I know that I can increase the performance separating the transaction 
logs and making a RAID 5 array BUT I am really curious about WHY this 
performance is so poor and HOW can I try to improve on this actual 
machine because actualy this inserts are taking around 90 minutes!!!

Cheers!

Rodrigo

dadosadv=# \d si2010
Table public.si2010
   Column   |   Type   |  Modifiers

Re: [PERFORM] Performance vs Schemas

2004-10-14 Thread Gregory S. Williamson
Igor,

I'm not sure if it is proper to state that schemas are themselves speeding things up.

As an example, we have data that is usually accessed by county; when we put all of the 
data into one big table and select from it using a code for a county of interest, the 
process is fairly slow as there are several hundred thousand candidate rows from that 
county in a table with many millions of rows. When we broke out certain aspects of the 
data into schemas (one per county) the searches become very fast indeed because we can 
skip the searching for a specific county code with the relevant tables and there is 
less (unneeded) data in the table being searched.  

As always, EXPLAIN ANALYZE ... is your friend in understanding what the planner is 
doing with a given query.

See http://www.varlena.com/varlena/GeneralBits/Tidbits/ for some useful information, 
especially under the performance tips section.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   Igor Maciel Macaubas [mailto:[EMAIL PROTECTED]
Sent:   Thu 10/14/2004 11:38 AM
To: [EMAIL PROTECTED]
Cc: 
Subject:[PERFORM] Performance vs Schemas
Hi all,

I recently migrated my database from schema 'public' to multiple schema.
I have around 100 tables, and divided them in 14 different schemas, and then adapted 
my application to use schemas as well.
I could percept that the query / insert / update times get pretty much faster then 
when I was using the old unique schema, and I'd just like to confirm with you if using 
schemas speed up the things. Is that true ?

What else I can do to speed up the query processing, best pratices, recommendations 
... ? What about indexed views, does postgresql supports it?

Regards,
Igor
--
[EMAIL PROTECTED]




---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Cleaning up indexes

2004-09-23 Thread Gregory S. Williamson

If you have set up the postgres instance to write stats, the tables 
pg_stat_user_indexes, pg_statio_all_indexes and so (use the \dS option at the psql 
prompt to see these system tables); also check the pg_stat_user_tables table and 
similar beasts for information on total access, etc. Between these you can get a good 
idea of what indexes are not being used, and from the sequentail scan info on tables 
perhaps some idea of what may need some indexes.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC 

-Original Message-
From:   Martin Foster [mailto:[EMAIL PROTECTED]
Sent:   Thu 9/23/2004 3:16 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:[PERFORM] Cleaning up indexes
My database was converted from MySQL a while back and has maintained all 
of the indexes which were previously used.   Tt the time however, there 
were limitations on the way PostgreSQL handled the indexes compared to 
MySQL.

Meaning that under MySQL, it would make use of a multi-column index even 
if the rows within did not match.When the conversion was made more 
indexes were created overall to correct this and proceed with the 
conversion.

Now the time has come to clean up the used indexes.   Essentially, I 
want to know if there is a way in which to determine which indexes are 
being used and which are not.   This will allow me to drop off the 
unneeded ones and reduce database load as a result.

And have things changed as to allow for mismatched multi-column indexes 
in version 7.4.x or even the upcoming 8.0.x?

Martin Foster
[EMAIL PROTECTED]

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?

2004-08-26 Thread Gregory S. Williamson
FWIW,

Informix does allow the fragmentation of data over named dbspaces by round-robin and 
expression; this is autosupporting as long as the dba keeps enough space available. 
You may also fragment the index although there are some variations depending on type 
of Informix (XPS, etc.); this is available in at least 9.3 ... I have never used the 
index fragmentation as its own beast, but the fragmenting of data works like a charm 
for spreadling load over more disks.

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From: Gaetano Mendola [mailto:[EMAIL PROTECTED]
Sent: Thursday, August 26, 2004 2:10 PM
To: Bruce Momjian; [EMAIL PROTECTED]
Subject: Re: [PERFORM] Equivalent praxis to CLUSTERED INDEX?


Bruce Momjian wrote:
 How do vendors actually implement auto-clustering?  I assume they move
 rows around during quiet periods or have lots of empty space in each
 value bucket.
 
 ---

IIRC informix doesn't have it, and you have to recluster periodically
the table. After having clustered the table with an index in order to
recluster the table with another index you have to release the previous
one ( ALTER index TO NOT CLUSTER ), the CLUSTER is an index attribute and
each table can have only one index with that attribute ON.


Regards
Gaetano Mendola



---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] postgresql performance with multimedia

2004-08-25 Thread Gregory S. Williamson

Not sure about the overall performance, etc. but I think that in order to collect 
statistics you need to set some values in the postgresql.conf config file, to wit:
#---
# RUNTIME STATISTICS
#---

# - Statistics Monitoring -

#log_parser_stats = false
#log_planner_stats = false
#log_executor_stats = false
#log_statement_stats = false

# - Query/Index Statistics Collector -

stats_start_collector = true
stats_command_string = true
stats_block_level = true
stats_row_level = true
#stats_reset_on_server_start = true

If the appropriate values aren't set this could account for why no entries are found 
in the pg_stat tables. The manual has details on these; you'll need to reload postgres 
to make any changes effective.

Greg


-Original Message-
From:   my ho [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/24/2004 11:54 PM
To: [EMAIL PROTECTED]
Cc: Jan Wieck
Subject:Re: [PERFORM] postgresql performance with multimedia
 Tom Lane answered to that question. The code in
 question does resolve 
 localhost with getaddrinfo() and then tries to
 create and bind a UDP 
 socket to all returned addresses. For some reason
 localhost on your 
 system resolves to an address that is not available
 for bind(2).

I tried to put my_ip instead of localhost in
bufmng.c and it seems to work (no more complaining).
However i check the pg_statio_all_tables and dont see
any recorded statistic at all. (all the columns are
'0')
some time postmaster shut down with this err msg: 
LOG:  statistics collector process (process_id)
exited with exit code 1
i starts postmaster with this command:
postmaster -i -p $PORT -D $PGDATA -k $PGDATA -N 32 -B
64 -o -s

  btw, what i want to ask here is does postgreSQL
 have
  any kind of read-ahead buffer implemented? 'cos it
  would be useful in multimedia case when we always
 scan
  the large table for continous data.
 
 Since there is no mechanism to control that data is
 stored contiguously 
 in the tables, what would that be good for?

i thought that rows in the table will be stored
contiguously? in that case, if the user is requesting
1 row, we make sure that the continue rows are ready
in the buffer pool so that when they next requested,
they wont be asked to read from disk. For multimedia
data, this is important 'cos data needs to be
presented continuously without any waiting.

thanks again for your help
MT Ho



__
Do You Yahoo!?
Tired of spam?  Yahoo! Mail has the best spam protection around 
http://mail.yahoo.com 

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Gregory S. Williamson

Usualy any bulk load is faster with indexes dropped and the rebuilt ... failing that 
(like you really need the indexes while loading, say into a hot table) be sure to 
wrap all the SQL into one transaction (BEGIN;...COMMIT;) ... if any data failes it all 
fails, which is usually easier to deal with than partial data loads, and it is *much* 
faster than having each insert being its own transaction.

HTH,

Greg Williamson
DBA
GlobeXplorer LLC

-Original Message-
From:   Rudi Starcevic [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/10/2004 4:04 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:[PERFORM] Bulk Insert and Index use
Hi,

I have a question on bulk checking, inserting into a table and
how best to use an index for performance.

The data I have to work with is a monthly CD Rom csv data dump of
300,000 property owners from one area/shire.

So every CD has 300,000 odd lines, each line of data which fills the 
'property' table.

Beginning with the first CD each line should require one SELECT and
one INSERT as it will be the first property with this address.

The SELECT uses fields like 'street' and 'suburb', to check for an 
existing property,
so I have built an index on those fields.

My question is does each INSERT rebuild the index on the 'street' and 
'suburb' fields?
I believe it does but I'm asking to be sure.

If this is the case I guess performance will suffer when I have, say, 
200,000
rows in the table.

Would it be like:

a) Use index to search on 'street' and 'suburb'
b) No result? Insert new record
c) Rebuild index on 'street' and 'suburb'

for each row?
Would this mean that after 200,000 rows each INSERT will require
the index of 000's of rows to be re-indexed?

So far I believe my only options are to use either and index
or sequential scan and see which is faster.

A minute for your thoughts and/or suggestions would be great.

Thanks.
Regards,
Rudi.


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster




---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] Bulk Insert and Index use

2004-08-10 Thread Gregory S. Williamson
If it has to read a majority (or even a good percentage) of the rows in question a 
sequential scan is probably faster ... and as Jim pointed out, a temp table can often 
be a useful medium for getting speed in a load and then allowing you to clean/alter 
data for a final (easy) push.

G
-Original Message-
From:   Rudi Starcevic [mailto:[EMAIL PROTECTED]
Sent:   Tue 8/10/2004 8:33 PM
To: [EMAIL PROTECTED]
Cc: 
Subject:Re: [PERFORM] Bulk Insert and Index use
Hi Jim,

Thanks for your time.

  If the bulk load has the possibility of duplicating data

Yes, each row will require either:

a) One SELECT + One INSERT
or
b) One SELECT + One UPDATE

I did think of using more than one table, ie. temp table.
As each month worth of data is added I expect to see
a change from lots of INSERTS to lots of UPDATES.

Perhaps when the UPDATES become more dominant it would
be best to start using Indexes.

While INSERTS are more prevelant perhaps a seq. scan is better.

I guess of all the options available it boils down to which
is quicker for my data: index or sequential scan.

Many thanks.

Jim J wrote:

 If the bulk load has the possibility of duplicating data, then you need 
 to change methods.  Try bulk loading into a temp table,  index it like 
 the original, eliminate the dups and merge the tables.
 
 It is also possible to do an insert from the temp table into the final 
 table like:
 insert into original (x,x,x)  (select temp.1, temp.2, etc from temp left 
 join original on temp.street=original.street where original.street is null)
 
 Good Luck
 Jim
 
 Rudi Starcevic wrote:
 
 Hi,

 I have a question on bulk checking, inserting into a table and
 how best to use an index for performance.

 The data I have to work with is a monthly CD Rom csv data dump of
 300,000 property owners from one area/shire.

 So every CD has 300,000 odd lines, each line of data which fills the 
 'property' table.

 Beginning with the first CD each line should require one SELECT and
 one INSERT as it will be the first property with this address.

 The SELECT uses fields like 'street' and 'suburb', to check for an 
 existing property,
 so I have built an index on those fields.

 My question is does each INSERT rebuild the index on the 'street' and 
 'suburb' fields?
 I believe it does but I'm asking to be sure.

 If this is the case I guess performance will suffer when I have, say, 
 200,000
 rows in the table.

 Would it be like:

 a) Use index to search on 'street' and 'suburb'
 b) No result? Insert new record
 c) Rebuild index on 'street' and 'suburb'

 for each row?
 Would this mean that after 200,000 rows each INSERT will require
 the index of 000's of rows to be re-indexed?

 So far I believe my only options are to use either and index
 or sequential scan and see which is faster.

 A minute for your thoughts and/or suggestions would be great.

 Thanks.
 Regards,
 Rudi.


 ---(end of broadcast)---
 TIP 4: Don't 'kill -9' the postmaster


 
 
 ---(end of broadcast)---
 TIP 5: Have you checked our extensive FAQ?
 
   http://www.postgresql.org/docs/faqs/FAQ.html
 
 


-- 


Regards,
Rudi.

Internet Media Productions

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org




---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] General performance questions about postgres on Apple

2004-06-06 Thread Gregory S. Williamson

Why is my name on a mail from Tom Lane ? Really, he knows a *lot* more than I and 
should get due credit.

Seriously, is this the peformance remailer mangling something ?

Greg Williamson
(the real one)

-Original Message-
From:   Gregory S. Williamson
Sent:   Sun 6/6/2004 10:46 PM
To: Sean Shanny
Cc: [EMAIL PROTECTED]
Subject:Re: [PERFORM] General performance questions about postgres on Apple
In-reply-to: [EMAIL PROTECTED] 
References: [EMAIL PROTECTED]
[EMAIL PROTECTED] [EMAIL PROTECTED]
[EMAIL PROTECTED]
Comments: In-reply-to Sean Shanny [EMAIL PROTECTED]message
dated Sun, 22 Feb 2004 21:48:54 -0500
Date: Sun, 22 Feb 2004 22:24:29 -0500
Message-ID: [EMAIL PROTECTED]
From: Tom Lane [EMAIL PROTECTED]
X-Virus-Scanned: by amavisd-new at postgresql.org
X-Mailing-List: pgsql-performance
Precedence: bulk
Sender: [EMAIL PROTECTED]
X-imss-version: 2.5
X-imss-result: Passed
X-imss-scores: Clean:99.9 C:21 M:2 S:5 R:5
X-imss-settings: Baseline:2 C:2 M:2 S:2 R:2 (0.1500 0.3000)
Return-Path: [EMAIL PROTECTED]
X-OriginalArrivalTime: 07 Jun 2004 05:27:21.0994 (UTC) FILETIME=[1BC0EEA0:01C44C50]

Sean Shanny [EMAIL PROTECTED] writes:
 We have the following setting for random page cost:
 random_page_cost = 1# units are one sequential page fetch cost
 Any suggestions on what to bump it up to?

Well, the default setting is 4 ... what measurements prompted you to
reduce it to 1?  The particular example you showed suggested that the
true value on your setup might be 10 or more.

Now I would definitely not suggest that you settle on any particular
value based on only one test case.  You need to try to determine an
appropriate average value, bearing in mind that there's likely to be
lots of noise in any particular measurement.

But in general, setting random_page_cost to 1 is only reasonable when
you are dealing with a fully-cached-in-RAM database, which yours isn't.

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend





---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Hardware recommendations to scale to silly load

2003-09-10 Thread Gregory S. Williamson
Nitpicking --

Perhaps the 4th data line is meant to be:
Inserts in separate transactions 2500 inserts/second
   ^^^
??


Greg Williamson

-Original Message-
From:   Bruce Momjian [mailto:[EMAIL PROTECTED]
Sent:   Tue 9/9/2003 8:25 PM
To: Matt Clark
Cc: Ron Johnson; PgSQL Performance ML
Subject:Re: [PERFORM] Hardware recommendations to scale to silly load

Matt Clark wrote:
  Just a data point, but on my Dual Xeon 2.4Gig machine with a 10k SCSI
  drive I can do 4k inserts/second if I turn fsync off.  If you have a
  battery-backed controller, you should be able to do the same.  (You will
  not need to turn fsync off --- fsync will just be fast because of the
  disk drive RAM).
 
  Am I missing something?
 
 I think Ron asked this, but I will too, is that 4k inserts in
 one transaction or 4k transactions each with one insert?
 
 fsync is very much faster (as are all random writes) with the
 write-back cache, but I'd hazard a guess that it's still not
 nearly as fast as turning fsync off altogether.  I'll do a test
 perhaps...

Sorry to be replying late.  Here is what I found.

fsync on
Inserts all in one transaction 3700 inserts/second
Inserts in separate transactions870 inserts/second

fsync off
Inserts all in one transaction 3700 inserts/second
Inserts all in one transaction 2500 inserts/second

ECPG test program attached.

--

  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073




---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings