Re: [PERFORM] COPY vs INSERT

2005-05-05 Thread Kris Jurka


On Wed, 4 May 2005, Mischa Sandberg wrote:

 Quoting Kris Jurka [EMAIL PROTECTED]: 
  
  Not true.  A client may send any number of Bind/Execute messages on 
  a prepared statement before a Sync message.

 Hunh. Interesting optimization in the JDBC driver. I gather it is 
 sending a string of (;)-separated inserts.

No, it uses the V3 protocol and a prepared statement and uses 
Bind/Execute, as I mentioned.

 Sounds like efficient-but-risky stuff we did for ODBC drivers at Simba
 ... gets interesting when one of the insert statements in the middle
 fails.

When running inside a transaction (as you really want to do anyway when
bulk loading) it is well defined, it is a little odd for auto commit mode
though.  In autocommit mode the transaction boundary is at the Sync
message, not the individual Execute messages, so you will get some
rollback on error.  The JDBC spec is poorly defined in this area, so we
can get away with this.

 Good to know. Hope that the batch size is parametric, given that
 you can have inserts with rather large strings bound to 'text' columns
 in PG --- harder to identify BLOBs when talking to PG, than when talking
 to MSSQL/Oracle/Sybase.

The batch size is not a parameter and I don't think it needs to be.  The 
issue of filling both sides of network buffers and deadlocking only needs 
to be avoided on one side.  The response to an insert request is small and 
not dependent on the size of the data sent, so we can send as much as we 
want as long as the server doesn't send much back to us.

Kris Jurka

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


[PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1

2005-05-05 Thread Jona
Hi
I'm currently experiencing problems with long query execution times.
What I believe makes these problems particularly interesting is the 
difference in execution plans between our test server running PostGreSQL 
7.3.6 and our production server running PostGreSQL 7.3.9.
The test server is an upgraded home machine, a Pentium 4 with 1GB of 
memory and IDE disk.
The production server is a dual CPU XEON Pentium 4 with 2GB memory and 
SCSI disks.
One should expect the production server to be faster, but appearently 
not as the outlined query plans below shows.

My questions can be summoned up to:
1) How come the query plans between the 2 servers are different?
2) How come the production server in general estimates the cost of the 
query plans so horribly wrong? (ie. it chooses a bad query plan where as 
the test server chooses a good plan)
3) In Query 2, how come the production server refuses the use its 
indexes (subcat_uq and aff_price_uq, both unique indexes) where as the 
test server determines that the indexes are the way to go
4) In Query 3, how come the test server refuses to use its index 
(sct2lang_uq) and the production server uses it? And why is the test 
server still faster eventhough it makes a sequential scan of a table 
with 8.5k records in?

Please note, a VACUUM ANALYSE is run on the production server once a day 
(used to be once an hour but it seemed to make no difference), however 
there are generally no writes to the tables used in the queries.

If anyone could shed some light on these issues I would truly appreciate 
it.

Cheers
Jona
PS. Please refer to part 2 for the other queries and query plans
 

Query 1:
EXPLAIN ANALYZE
SELECT DISTINCT StatConTrans_Tbl.id, Code_Tbl.sysnm AS code, 
PriceCat_Tbl.amount AS price, Country_Tbl.currency,
 CreditsCat_Tbl.amount AS credits, Info_Tbl.title, Info_Tbl.description
FROM (SCT2SubCatType_Tbl
INNER JOIN SCT2Lang_Tbl ON SCT2SubCatType_Tbl.sctid = SCT2Lang_Tbl.sctid
INNER JOIN Language_Tbl ON SCT2Lang_Tbl.langid = Language_Tbl.id AND 
Language_Tbl.sysnm = UPPER('us') AND Language_Tbl.enabled = true
INNER JOIN Info_Tbl ON SCT2SubCatType_Tbl.sctid = Info_Tbl.sctid AND 
Language_Tbl.id = Info_Tbl.langid
INNER JOIN SubCatType_Tbl ON SCT2SubCatType_Tbl.subcattpid = 
SubCatType_Tbl.id AND SubCatType_Tbl.enabled = true
INNER JOIN CatType_Tbl ON SubCatType_Tbl.cattpid = CatType_Tbl.id AND 
CatType_Tbl.enabled = true
INNER JOIN SuperCatType_Tbl ON CatType_Tbl.spcattpid = 
SuperCatType_Tbl.id AND SuperCatType_Tbl.enabled = true
INNER JOIN StatConTrans_Tbl ON SCT2SubCatType_Tbl.sctid = 
StatConTrans_Tbl.id AND StatConTrans_Tbl.enabled = true
INNER JOIN Price_Tbl ON StatConTrans_Tbl.id = Price_Tbl.sctid AND 
Price_Tbl.affid = 8
INNER JOIN PriceCat_Tbl ON Price_Tbl.prccatid = PriceCat_Tbl.id AND 
PriceCat_Tbl.enabled = true
INNER JOIN Country_Tbl ON PriceCat_Tbl.cntid = Country_Tbl.id AND 
Country_Tbl.enabled = true
INNER JOIN CreditsCat_Tbl ON Price_Tbl.crdcatid = CreditsCat_Tbl.id AND 
CreditsCat_Tbl.enabled = true
INNER JOIN StatCon_Tbl ON StatConTrans_Tbl.id = StatCon_Tbl.sctid AND 
StatCon_Tbl.ctpid = 1
INNER JOIN Code_Tbl ON SuperCatType_Tbl.id = Code_Tbl.spcattpid AND 
Code_Tbl.affid = 8 AND Code_Tbl.cdtpid = 1)
WHERE SCT2SubCatType_Tbl.subcattpid = 79
ORDER BY StatConTrans_Tbl.id DESC
LIMIT 8 OFFSET 0

Plan on PostGre 7.3.6 on Red Hat Linux 3.2.3-39
Limit  (cost=178.59..178.61 rows=1 width=330) (actual time=22.77..28.51 
rows=4 loops=1)
  -  Unique  (cost=178.59..178.61 rows=1 width=330) (actual 
time=22.77..28.50 rows=4 loops=1)
-  Sort  (cost=178.59..178.60 rows=1 width=330) (actual 
time=22.76..22.85 rows=156 loops=1)
  Sort Key: statcontrans_tbl.id, code_tbl.sysnm, 
pricecat_tbl.amount, country_tbl.currency, creditscat_tbl.amount, 
info_tbl.title, info_tbl.description
  -  Hash Join  (cost=171.19..178.58 rows=1 width=330) 
(actual time=3.39..6.55 rows=156 loops=1)
Hash Cond: (outer.cntid = inner.id)
-  Nested Loop  (cost=170.13..177.51 rows=1 
width=312) (actual time=3.27..5.75 rows=156 loops=1)
  Join Filter: (inner.sctid = outer.sctid)
  -  Hash Join  (cost=170.13..171.48 rows=1 
width=308) (actual time=3.12..3.26 rows=4 loops=1)
Hash Cond: (outer.crdcatid = 
inner.id)
-  Hash Join  (cost=169.03..170.38 
rows=1 width=300) (actual time=3.00..3.11 rows=4 loops=1)
  Hash Cond: (outer.spcattpid = 
inner.spcattpid)
  -  Hash Join  
(cost=167.22..168.56 rows=1 width=253) (actual time=2.88..2.97 rows=4 
loops=1)
Hash Cond: (outer.id = 
inner.prccatid)
-  Seq Scan on 
pricecat_tbl  (cost=0.00..1.29 rows=12 width=12) 

Re: [PERFORM] Bad choice of query plan from PG 7.3.6 to PG 7.3.9 part 1

2005-05-05 Thread Tom Lane
Jona [EMAIL PROTECTED] writes:
 I'm currently experiencing problems with long query execution times.
 What I believe makes these problems particularly interesting is the 
 difference in execution plans between our test server running PostGreSQL 
 7.3.6 and our production server running PostGreSQL 7.3.9.
 The test server is an upgraded home machine, a Pentium 4 with 1GB of 
 memory and IDE disk.
 The production server is a dual CPU XEON Pentium 4 with 2GB memory and 
 SCSI disks.
 One should expect the production server to be faster, but appearently 
 not as the outlined query plans below shows.

I think the plans are fine; it looks to me like the production server
has serious table-bloat or index-bloat problems, probably because of
inadequate vacuuming.  For instance compare these entries:

-  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..6.01 rows=1 
width=4) (actual time=0.05..0.31 rows=39 loops=4)
  Index Cond: ((statcon_tbl.sctid = outer.sctid) AND (statcon_tbl.ctpid = 
1))

-  Index Scan using ctp_statcon on statcon_tbl  (cost=0.00..20.40 rows=5 
width=4) (actual time=27.97..171.84 rows=39 loops=4)
  Index Cond: ((statcon_tbl.sctid = outer.sctid) AND (statcon_tbl.ctpid = 
1))

Appears to be exactly the same task ... but the test server spent
1.24 msec total while the production server spent 687.36 msec total.
That's more than half of your problem right there.  Some of the other
scans seem a lot slower on the production machine too.

 1) How come the query plans between the 2 servers are different?

The production server's rowcount estimates are pretty good, the test
server's are not.  How long since you vacuumed/analyzed the test server?

It'd be interesting to see the output of vacuum verbose statcon_tbl
on both servers ...

regards, tom lane

PS: if you post any more query plans, please try to use software that
doesn't mangle the formatting so horribly ...

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Table stats

2005-05-05 Thread David Roussel
 Should there not be at least one Index Scan showing in the stats?
not if there was a table scan
---(end of broadcast)---
TIP 8: explain analyze is your friend