Re: [PERFORM] CHECK vs REFERENCES

2005-09-21 Thread Vivek Khera


On Sep 9, 2005, at 11:23 PM, Marc G. Fournier wrote:

The case is where I just want to check that a value being inserted  
is one of a few possible values, with that list of values rarely  
(if ever) changing, so havng a 'flexible list' REFERENCED seems  
relatively overkill ...




That's what I thought until the first time that list needed to be  
altered.  At this point, it becomes a royal pain.


point to take: do it right the first time, or you have to do it over,  
and over, and over...


Vivek Khera, Ph.D.
+1-301-869-4449 x806



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Performance considerations for very heavy INSERT traffic

2005-09-21 Thread Vivek Khera
On Sep 12, 2005, at 6:02 PM, Brandon Black wrote:- using COPY instead of INSERT ?(should be easy to do from the aggregators)Possibly, although it would kill the current design of returning the database transaction status for a single client packet back to the client on transaction success/failure.   The aggregator could put several clients' data into a series of delayed multi-row copy statements.buffer through the file system on your aggregator.  once you "commit" to local disk file, return back to your client that you got the data.  then insert into the actual postgres DB in large batches of inserts inside a single Postgres transaction.we have our web server log certain tracking requests to a local file.  with file locks and append mode, it is extremely quick and has little contention delays. then every so often, we lock the file, rename  it, release the lock, then process it at our leisure to do the inserts to Pg in one big transaction. Vivek Khera, Ph.D. +1-301-869-4449 x806  

Re: [PERFORM] Performance considerations for very heavy INSERT traffic

2005-09-21 Thread Vivek Khera
On Sep 12, 2005, at 6:02 PM, Brandon Black wrote:- splitting the xlog and the data on distinct physical drives or arraysThat would almost definitely help, I haven't tried it yet.  Speaking of the xlog, anyone know anything specific about the WAL tuning parameters for heavy concurrent write traffic?  What little I could dig up on WAL tuning was contradictory, and testing some random changes to the parameters hasn't been very conclusive yet.  I would imagine the WAL buffers stuff could potentially have a large effect for us.you will want to make your pg_xlog RAID volume BIG, and then tell postgres to use that space: bump up checkpoint_segments (and suitably the checkpoint timeouts).  I run with 256 segments and a timeout of 5 minutes.  The timeout refletcs your  expected crash recovery time, so adjust it wiselyAlso, you should consider how you split your drives across your RAID data channels on your test machine: I put each pair of the RAID10 mirrors on opposite channels, so both channels of my RAID controller are pretty evenly loaded during write. Vivek Khera, Ph.D. +1-301-869-4449 x806  

[PERFORM] Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)

2005-09-21 Thread Gurpreet Aulakh
I currently have a Postgres 7.3 database running under WIN2K using cygwin
and want to move to Postgres 8.0.3 (native windows version).
I am finding most simple queries are significantly faster on the native
windows version compared to 7.3 (under cygwin).
However, for a complex query, that involve multiple JOINs, the 7.3 version
is actually faster (about 2X faster).

The query that I am running was optimized to run under 7.3. It was
specifically modified to control the planner with explicit JOINs.
When I run the same query on the 8.0.3 version with the join_collapse_limit
set to 1 the query is slower.

Can someone tell me why setting the join_collapse_limit to 1 in the 8.0
version does not produce similar results to the 7.3 version?
Does anyone have any suggestions on what I can do? Do I have to rewrite the
query?


Here are the results of an explain analyze on the query.

Explain analyze Postgres 7.3 running on WIN2K using cygwin.

Hash Join  (cost=21808.27..1946264.80 rows=2982 width=1598) (actual
time=2186.00..2320.00 rows=50 loops=1)
   Hash Cond: (outer.doc_internalparentomxref = inner.doc_documentid)
   -  Hash Join  (cost=20948.78..1945323.29 rows=2982 width=1534) (actual
time=2110.00..2227.00 rows=50 loops=1)
 Hash Cond: (outer.doc_internalrootxref = inner.doc_documentid)
 -  Hash Join  (cost=20089.29..1944381.79 rows=2982 width=1484)
(actual time=2067.00..2179.00 rows=50 loops=1)
   Hash Cond: (outer.doc_documentid = inner.doc_documentid)
   Join Filter: (inner.dc_doccontacttype = 'FROM'::character
varying)
   -  Hash Join  (cost=7455.14..1928613.59 rows=2982
width=1138) (actual time=1216.00..1539.00 rows=50 loops=1)
 Hash Cond: (outer.doc_documentid =
inner.doc_documentid)
 Join Filter: (inner.dc_doccontacttype =
'TO'::character varying)
 -  Hash Join  (cost=183.49..1918519.06 rows=2860
width=792) (actual time=64.00..301.00 rows=50 loops=1)
   Hash Cond: (outer.doc_documentid =
inner.doc_documentid)
   -  Seq Scan on document finaldoc
(cost=0.00..1918256.94 rows=2860 width=717) (actual time=13.00..254.00
rows=50 loops=1)
 Filter: (subplan)
 SubPlan
   -  Materialize  (cost=335.27..335.27
rows=50 width=160) (actual time=0.00..0.01 rows=50 loops=5719)
 -  Limit  (cost=0.00..335.27
rows=50 width=160) (actual time=3.00..8.00 rows=50 loops=1)
   -  Nested Loop
(cost=0.00..38347.95 rows=5719 width=160) (actual time=3.00..8.00 rows=51
loops=1)
 -  Merge Join
(cost=0.00..3910.14 rows=5719 width=120) (actual time=3.00..3.00 rows=51
loops=1)
   Merge Cond:
(outer.doc_documentid = inner.doc_documentid)
   -  Index Scan
using pk_document on document doc  (cost=0.00..3256.48 rows=5719 width=80)
(actual time=1.00..1.00 rows=51 loops=1)
   -  Index Scan
using pk_folder_document on folder_document  (cost=0.00..553.91 rows=5719
width=40) (actual time=2.00..2.00 rows=51 loops=1)
 -  Index Scan using
pk_document on document root  (cost=0.00..6.01 rows=1 width=40) (actual
time=0.10..0.10 rows=1 loops=51)
   Index Cond:
(outer.doc_internalrootxref = root.doc_documentid)
   -  Hash  (cost=169.19..169.19 rows=5719
width=75) (actual time=31.00..31.00 rows=0 loops=1)
 -  Seq Scan on folder_document
(cost=0.00..169.19 rows=5719 width=75) (actual time=0.00..11.00 rows=5719
loops=1)
 -  Hash  (cost=1328.80..1328.80 rows=34280 width=346)
(actual time=846.00..846.00 rows=0 loops=1)
   -  Seq Scan on doccontact dcto
(cost=0.00..1328.80 rows=34280 width=346) (actual time=0.00..175.00
rows=34280 loops=1)
   -  Hash  (cost=1328.80..1328.80 rows=34280 width=346)
(actual time=445.00..445.00 rows=0 loops=1)
 -  Seq Scan on doccontact dcfrom  (cost=0.00..1328.80
rows=34280 width=346) (actual time=0.00..223.00 rows=34280 loops=1)
 -  Hash  (cost=845.19..845.19 rows=5719 width=50) (actual
time=42.00..42.00 rows=0 loops=1)
   -  Seq Scan on document root  (cost=0.00..845.19 rows=5719
width=50) (actual time=0.00..2.00 rows=5719 loops=1)
   -  Hash  (cost=845.19..845.19 rows=5719 width=64) (actual
time=73.00..73.00 rows=0 loops=1)
 -  Seq Scan on document parentom  (cost=0.00..845.19 rows=5719
width=64) (actual time=0.00..30.00 rows=5719 loops=1)
   SubPlan
 -  Limit  (cost=0.00..5.56 rows=1 width=40) (actual 

Re: [PERFORM] Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)

2005-09-21 Thread Gurpreet Aulakh
I have started to break my query down and analyze each piece.
What I have discovered is very interesting.

First here is a small piece of my query.

EXPLAIN ANALYZE SELECT doc.doc_documentid FROM document AS doc
LEFT JOIN document as root ON doc.doc_internalRootXref =
root.doc_documentId
LEFT JOIN folder_document ON doc.doc_documentid =
folder_document.doc_documentId LIMIT 500 OFFSET 0

When I run this on Postgres 8.0.3 running under windows this is the result

QUERY PLAN
Limit  (cost=183.49..753.41 rows=500 width=40) (actual time=47.000..79.000
rows=500 loops=1)
  -  Hash Left Join  (cost=183.49..6702.23 rows=5719 width=40) (actual
time=47.000..79.000 rows=500 loops=1)
Hash Cond: ((outer.doc_documentid)::text =
(inner.doc_documentid)::text)
-  Merge Left Join  (cost=0.00..6432.96 rows=5719 width=40) (actual
time=0.000..16.000 rows=500 loops=1)
  Merge Cond: ((outer.doc_internalrootxref)::text =
(inner.doc_documentid)::text)
  -  Index Scan using doc_internalrootxref_index on document
doc  (cost=0.00..3172.64 rows=5719 width=80) (actual time=0.000..0.000
rows=500 loops=1)
  -  Index Scan using pk_document on document root
(cost=0.00..3174.53 rows=5719 width=40) (actual time=0.000..0.000 rows=863
loops=1)
-  Hash  (cost=169.19..169.19 rows=5719 width=40) (actual
time=47.000..47.000 rows=0 loops=1)
  -  Seq Scan on folder_document  (cost=0.00..169.19 rows=5719
width=40) (actual time=0.000..16.000 rows=5719 loops=1)
Total runtime: 79.000 ms

Here is the result of running the same query on the Postgres 7.3 running
under Cygwin

QUERY PLAN
Limit  (cost=183.49..775.31 rows=500 width=160) (actual time=13.00..44.00
rows=500 loops=1)
  -  Hash Join  (cost=183.49..6952.79 rows=5719 width=160) (actual
time=13.00..44.00 rows=501 loops=1)
Hash Cond: (outer.doc_documentid = inner.doc_documentid)
-  Merge Join  (cost=0.00..6612.03 rows=5719 width=120) (actual
time=0.00..29.00 rows=775 loops=1)
  Merge Cond: (outer.doc_internalrootxref =
inner.doc_documentid)
  -  Index Scan using doc_internalrootxref_index on document
doc  (cost=0.00..3254.39 rows=5719 width=80) (actual time=0.00..7.00
rows=775 loops=1)
  -  Index Scan using pk_document on document root
(cost=0.00..3257.88 rows=5719 width=40) (actual time=0.00..15.00 rows=1265
loops=1)
-  Hash  (cost=169.19..169.19 rows=5719 width=40) (actual
time=12.00..12.00 rows=0 loops=1)
  -  Seq Scan on folder_document  (cost=0.00..169.19 rows=5719
width=40) (actual time=0.00..9.00 rows=5719 loops=1)
Total runtime: 45.00 msec

What is really interesting is the time it takes for the Hash to occur. For
the first hash, on the 7.3 it takes only 12ms while on the 8.0 it takes
47ms.
Now the databases are created from the same data and I have run
vacuumdb -f -z on the databases.

Now I have read something on the archives that stated that perhaps the data
is in the filesystem (not database) cache. Would this be the case?. If so
how would I improve the  performance under WIN2K?

Anyone have any ideas?


-Original Message-
From: [EMAIL PROTECTED]
[mailto:[EMAIL PROTECTED] Behalf Of Gurpreet
Aulakh
Sent: September 21, 2005 12:38 PM
To: pgsql-performance@postgresql.org
Subject: [PERFORM] Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)


I currently have a Postgres 7.3 database running under WIN2K using cygwin
and want to move to Postgres 8.0.3 (native windows version).
I am finding most simple queries are significantly faster on the native
windows version compared to 7.3 (under cygwin).
However, for a complex query, that involve multiple JOINs, the 7.3 version
is actually faster (about 2X faster).

The query that I am running was optimized to run under 7.3. It was
specifically modified to control the planner with explicit JOINs.
When I run the same query on the 8.0.3 version with the join_collapse_limit
set to 1 the query is slower.

Can someone tell me why setting the join_collapse_limit to 1 in the 8.0
version does not produce similar results to the 7.3 version?
Does anyone have any suggestions on what I can do? Do I have to rewrite the
query?


Here are the results of an explain analyze on the query.

Explain analyze Postgres 7.3 running on WIN2K using cygwin.

Hash Join  (cost=21808.27..1946264.80 rows=2982 width=1598) (actual
time=2186.00..2320.00 rows=50 loops=1)
   Hash Cond: (outer.doc_internalparentomxref = inner.doc_documentid)
   -  Hash Join  (cost=20948.78..1945323.29 rows=2982 width=1534) (actual
time=2110.00..2227.00 rows=50 loops=1)
 Hash Cond: (outer.doc_internalrootxref = inner.doc_documentid)
 -  Hash Join  (cost=20089.29..1944381.79 rows=2982 width=1484)
(actual time=2067.00..2179.00 rows=50 loops=1)
   Hash Cond: (outer.doc_documentid = inner.doc_documentid)
   Join Filter: (inner.dc_doccontacttype = 'FROM'::character
varying)
   -  Hash Join  

Re: [PERFORM] Query slower on 8.0.3 (Windows) vs 7.3 (cygwin)

2005-09-21 Thread Tom Lane
Gurpreet Aulakh [EMAIL PROTECTED] writes:
 What is really interesting is the time it takes for the Hash to occur. For
 the first hash, on the 7.3 it takes only 12ms while on the 8.0 it takes
 47ms.

You haven't told us a thing about the column datatypes involved (much
less what the query actually is) ... but I wonder if this is a textual
datatype and the 8.0 installation is using a non-C locale where the 7.3
installation is using C locale.  That could account for a considerable
slowdown in text comparison speeds.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] SELECT LIMIT 1 VIEW Performance Issue

2005-09-21 Thread K C Lau

Hi All,

Investigating further on this problem I brought up in June, the following 
query with pg 8.0.3 on Windows scans all 1743 data records for a player:


esdt= explain analyze select PlayerID,AtDate from Player a
 where PlayerID='0' and AtDate = (select b.AtDate from Player b
 where b.PlayerID = a.PlayerID order by b.PlayerID desc, b.AtDate desc 
LIMIT 1);


 Index Scan using pk_player on player a  (cost=0.00..2789.07 rows=9 
width=23) (a

ctual time=51.046..51.049 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..0.83 rows=1 width=23) (actual 
time=0.016..0.017 rows

=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..970.

53 rows=1166 width=23) (actual time=0.011..0.011 rows=1 loops=1743)
 Index Cond: ((playerid)::text = ($0)::text)
 Total runtime: 51.133 ms

Using a static value in the subquery produces the desired result below, but 
since we use views for our queries (see last part of this email), we cannot 
push the static value into the subquery:


esdt= explain analyze select PlayerID,AtDate from Player a
 where PlayerID='0' and AtDate = (select b.AtDate from Player b
 where b.PlayerID = '0' order by b.PlayerID desc, b.AtDate desc LIMIT 1);

 Index Scan using pk_player on player a  (cost=0.75..4.26 rows=1 width=23) 
(actu

al time=0.054..0.058 rows=1 loops=1)
   Index Cond: (((playerid)::text = '0'::text) AND ((atdate)::text = 
($0)::t

ext))
   InitPlan
 -  Limit  (cost=0.00..0.75 rows=1 width=23) (actual 
time=0.028..0.029 rows

=1 loops=1)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..1323

.05 rows=1756 width=23) (actual time=0.023..0.023 rows=1 loops=1)
 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.149 ms

The Player table has a primary key on PlayerID, AtDate. Is there a way to 
stop the inner-most index scan looping all 1743 data records for that 
player?  Is that a bug or known issue?


BTW, I tried using 8.1 beta2 on Windows and its performance is similar, I 
have also tried other variants such as MAX and DISTINCT but with no success.


Any help is most appreciated.

Best regards,
KC.


At 10:46 05/06/15, K C Lau wrote:

Hi All,

I previously posted the following as a sequel to my SELECT DISTINCT 
Performance Issue question. We would most appreciate any clue or 
suggestions on how to overcome this show-stopping issue. We are using 
8.0.3 on Windows.


Is it a known limitation when using a view with SELECT ... LIMIT 1?

Would the forthcoming performance enhancement with MAX help when used 
within a view, as in:


create or replace view VCurPlayer as select * from Player a
where a.AtDate = (select Max(b.AtDate) from Player b where a.PlayerID = 
b.PlayerID);


select PlayerID,AtDate from VCurPlayer where PlayerID='0';

Thanks and regards,
KC.

-

Actually the problem with LIMIT 1 query is when we use views with the 
LIMIT 1 construct. The direct SQL is ok:


esdt= explain analyze select PlayerID,AtDate from Player where 
PlayerID='0'

 order by PlayerID desc, AtDate desc LIMIT 1;

 Limit  (cost=0.00..1.37 rows=1 width=23) (actual time=0.000..0.000 
rows=1 loops=1)
   -  Index Scan Backward using pk_player on 
player  (cost=0.00..16074.23 rows=11770 width=23) (actual 
time=0.000..0.000 rows=1 loops=1)

 Index Cond: ((playerid)::text = '0'::text)
 Total runtime: 0.000 ms

esdt= create or replace view VCurPlayer3 as select * from Player a
where AtDate = (select b.AtDate from Player b where a.PlayerID = b.PlayerID
order by b.PlayerID desc, b.AtDate desc LIMIT 1);

esdt= explain analyze select PlayerID,AtDate,version from VCurPlayer3 
where PlayerID='0';
 Index Scan using pk_player on player a  (cost=0.00..33072.78 rows=59 
width=27)

(actual time=235.000..235.000 rows=1 loops=1)
   Index Cond: ((playerid)::text = '0'::text)
   Filter: ((atdate)::text = ((subplan))::text)
   SubPlan
 -  Limit  (cost=0.00..1.44 rows=1 width=23) (actual 
time=0.117..0.117 rows=1 loops=1743)
   -  Index Scan Backward using pk_player on player 
b  (cost=0.00..14023.67 rows=9727 width=23) (actual time=0.108..0.108 
rows=1 loops=1743)

 Index Cond: (($0)::text = (playerid)::text)
 Total runtime: 235.000 ms



---(end of broadcast)---
TIP 6: explain analyze is your friend