Re: [PERFORM] Bottleneck?

2009-08-10 Thread Ip Wing Kin John
Hi Scott,

Thanks for you suggestion. I have follow your suggestion by disable
nestloop and have a substantial improvement. Takes 51s now. I have
attached the new query plan in another file.

What I want to ask is, is there any other way to hint the planner to
choose to use merge join rather than nested loop by modifying my SQL?
I did try to sort my second inner join by the join condition, but the
planner still prefer to use nested loop.

As I am afraid changing the system wide configuration will have some
side effect on my other queries.

Here is my SQL.

select * from dummymediastatus rec INNER JOIN ( SELECT volumeGUID ,
MAX(startDatetime) AS msdt FROM dummymediastatus INNER JOIN ( select *
from getcurrentguids(1249281281666,'hardware.volume',null,null) ) AS
cfg ON ( cfg.getcurrentguids = volumeGUID) WHERE startDatetime =
1249281281666 GROUP BY volumeGUID ) AS rec2 ON (  rec.volumeGUID =
rec2.volumeGUID AND  rec.startDatetime = rec2.msdt ) where  (  ( 10
and 10 )  and  rec.acsGUID in ( SELECT * FROM resolve('acs0') ) )
order by rec.startDatetime DESC,rec.id DESC;

thanks




On Thu, Aug 6, 2009 at 5:03 PM, Scott Marlowescott.marl...@gmail.com wrote:
 OK, two things.  First the row estimate starts going way off around
 the time it gets to the hash aggregate / nested loop which seems to be
 making the planner use a bad plan for this many rows.  You can try
 issuing

 set enable_nestloop = off;

 before running the query and see if that makes it any faster.

 Secondly, the first time you run this query you are reading the 1.8G
 table sequentially, and at about 55MB/s, which isn't gonna get faster
 without more / faster drives under your machine.

 On Thu, Aug 6, 2009 at 12:50 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Here u go. Both in the same file.

 On Thu, Aug 6, 2009 at 4:48 PM, Scott Marlowescott.marl...@gmail.com wrote:
 Much better... Looks like I got the second one...

 Can I get the first one too?  Thx.

 On Thu, Aug 6, 2009 at 12:46 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Hope you can get it this time.

 John

 On Thu, Aug 6, 2009 at 4:34 PM, Scott Marlowescott.marl...@gmail.com 
 wrote:
 Sorry man, it's not coming through.  Try it this time addressed just to 
 me.

 On Thu, Aug 6, 2009 at 12:23 AM, Ip Wing Kin Johnwkipj...@gmail.com 
 wrote:
 Hi scott

 I attached the query plan with this email. The top one is the first
 run after I restarted my machine. And the bottom one is the second
 run.

 I am using PostgreSQL 8.3 on Solaris 10.

 cheers

 On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowescott.marl...@gmail.com 
 wrote:
 On Wed, Aug 5, 2009 at 11:21 PM, wkipj...@gmail.com wrote:
 Sorry post again.

 Nope, still mangled.  Can you attach it?




 --
 John




 --
 When fascism comes to America, it will be intolerance sold as diversity.




 --
 John




 --
 When fascism comes to America, it will be intolerance sold as diversity.




 --
 John




 --
 When fascism comes to America, it will be intolerance sold as diversity.




-- 
John


 QUERY PLAN

 Sort  (cost=710118.74..710118.75 rows=3 width=567) (actual 
time=51563.029..51580.020 rows=80963 loops=1)
   Sort Key: rec.startdatetime, rec.id
   Sort Method:  quicksort  Memory: 43163kB
   -  Hash Join  (cost=360922.21..710118.72 rows=3 width=567) (actual 
time=43014.876..51132.786 rows=80963 loops=1)
 Hash Cond: ((rec.acsguid)::text = (resolve.resolve)::text)
 -  Merge Join  (cost=360655.21..709851.67 rows=3 width=567) (actual 
time=42458.324..50434.884 rows=80963 loops=1)
   Merge Cond: (rec.startdatetime = 
(max(dummymediastatus.startdatetime)))
   Join Filter: ((rec.volumeguid)::text = 
(dummymediastatus.volumeguid)::text)
   -  Index Scan using index_dummymediastatus_startdatetime on 
dummymediastatus rec  (cost=0.00..339020.12 rows=4000362 width=414) (actual 
time=41.617..6324.895 rows=352 loops=1)
   -  Sort  (cost=360655.21..360664.23 rows=3608 width=153) 
(actual time=42416.687..42453.669 rows=81934 loops=1)
 Sort Key: (max(dummymediastatus.startdatetime))
 Sort Method:  quicksort  Memory: 5174kB
 -  HashAggregate  (cost=360360.86..360405.96 rows=3608 
width=16) (actual time=42257.696..42309.261 rows=8 loops=1)
   -  Hash Join  (cost=335135.05..354817.67 
rows=1108637 width=16) (actual time=37252.925..39518.267 rows=400 loops=1)
 Hash Cond: 
((getcurrentguids.getcurrentguids)::text = (dummymediastatus.volumeguid)::text)
 -  Function Scan on getcurrentguids  
(cost=0.00..260.00 rows=1000 width=32) (actual 

Re: [PERFORM] Bottleneck?

2009-08-10 Thread Scott Marlowe
On Mon, Aug 10, 2009 at 12:22 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Hi Scott,

 Thanks for you suggestion. I have follow your suggestion by disable
 nestloop and have a substantial improvement. Takes 51s now. I have
 attached the new query plan in another file.

 What I want to ask is, is there any other way to hint the planner to
 choose to use merge join rather than nested loop by modifying my SQL?
 I did try to sort my second inner join by the join condition, but the
 planner still prefer to use nested loop.

 As I am afraid changing the system wide configuration will have some
 side effect on my other queries.

Yeah, that's more of a troubleshooting procedure than something you'd
want to institute system wide.  If you must set it for this query, you
can do so just before you run it in your connection, then turn it back
on for the rest of your queries.  I.e.:

set enable_nestloop=off;
select ;
set enable_nestloop=on;

I've had one or two big queries in the past that no amount of tuning
and setting stats target higher and analyzing could force to choose
the right plan.

If you haven't already, try setting the default statistic target
higher and re-analyzing to see if that helps.  After that you can play
around a bit with the cost parameters to see what helps.  Note that
just like setting enable_nestloop on or off, you can do so for the
current connection only and not globally, especially while just
testing.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-10 Thread Robert Haas
On Mon, Aug 10, 2009 at 2:22 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Hi Scott,

 Thanks for you suggestion. I have follow your suggestion by disable
 nestloop and have a substantial improvement. Takes 51s now. I have
 attached the new query plan in another file.

 What I want to ask is, is there any other way to hint the planner to
 choose to use merge join rather than nested loop by modifying my SQL?
 I did try to sort my second inner join by the join condition, but the
 planner still prefer to use nested loop.

 As I am afraid changing the system wide configuration will have some
 side effect on my other queries.

 Here is my SQL.

 select * from dummymediastatus rec INNER JOIN ( SELECT volumeGUID ,
 MAX(startDatetime) AS msdt FROM dummymediastatus INNER JOIN ( select *
 from getcurrentguids(1249281281666,'hardware.volume',null,null) ) AS
 cfg ON ( cfg.getcurrentguids = volumeGUID) WHERE startDatetime =
 1249281281666 GROUP BY volumeGUID ) AS rec2 ON (  rec.volumeGUID =
 rec2.volumeGUID AND  rec.startDatetime = rec2.msdt ) where  (  ( 10
 and 10 )  and  rec.acsGUID in ( SELECT * FROM resolve('acs0') ) )
 order by rec.startDatetime DESC,rec.id DESC;

It looks to me like a big chunk of your problem is here:

- Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000
width=32) (actual time=977.013..997.404 rows=8 loops=1)

The planner's estimate of the number of rows is off by a factor of 80
here.  You should probably think about inlining the SQL contained
inside that function, if possible.  You might also want to look at the
rows setting of CREATE OR REPLACE FUNCTION.

As tempting as it is to encapsulate some of your logic into a
set-returning function of some sort, as you've done here, I've found
that it tends to suck.  Even if you fix the row estimate, the planner
will still estimate join selectivity etc. poorly for those rows
because, of course, there are no statistics.

...Robert

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Scott Marlowe
On Wed, Aug 5, 2009 at 11:21 PM, wkipj...@gmail.com wrote:
 Sorry post again.

Nope, still mangled.  Can you attach it?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Ip Wing Kin John
Hi scott

I attached the query plan with this email. The top one is the first
run after I restarted my machine. And the bottom one is the second
run.

I am using PostgreSQL 8.3 on Solaris 10.

cheers

On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowescott.marl...@gmail.com wrote:
 On Wed, Aug 5, 2009 at 11:21 PM, wkipj...@gmail.com wrote:
 Sorry post again.

 Nope, still mangled.  Can you attach it?




-- 
John

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Scott Marlowe
OK, two things.  First the row estimate starts going way off around
the time it gets to the hash aggregate / nested loop which seems to be
making the planner use a bad plan for this many rows.  You can try
issuing

set enable_nestloop = off;

before running the query and see if that makes it any faster.

Secondly, the first time you run this query you are reading the 1.8G
table sequentially, and at about 55MB/s, which isn't gonna get faster
without more / faster drives under your machine.

On Thu, Aug 6, 2009 at 12:50 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Here u go. Both in the same file.

 On Thu, Aug 6, 2009 at 4:48 PM, Scott Marlowescott.marl...@gmail.com wrote:
 Much better... Looks like I got the second one...

 Can I get the first one too?  Thx.

 On Thu, Aug 6, 2009 at 12:46 AM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 Hope you can get it this time.

 John

 On Thu, Aug 6, 2009 at 4:34 PM, Scott Marlowescott.marl...@gmail.com 
 wrote:
 Sorry man, it's not coming through.  Try it this time addressed just to me.

 On Thu, Aug 6, 2009 at 12:23 AM, Ip Wing Kin Johnwkipj...@gmail.com 
 wrote:
 Hi scott

 I attached the query plan with this email. The top one is the first
 run after I restarted my machine. And the bottom one is the second
 run.

 I am using PostgreSQL 8.3 on Solaris 10.

 cheers

 On Thu, Aug 6, 2009 at 4:15 PM, Scott Marlowescott.marl...@gmail.com 
 wrote:
 On Wed, Aug 5, 2009 at 11:21 PM, wkipj...@gmail.com wrote:
 Sorry post again.

 Nope, still mangled.  Can you attach it?




 --
 John




 --
 When fascism comes to America, it will be intolerance sold as diversity.




 --
 John




 --
 When fascism comes to America, it will be intolerance sold as diversity.




 --
 John




-- 
When fascism comes to America, it will be intolerance sold as diversity.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Ray Stell
On Thu, Aug 06, 2009 at 12:50:51PM +1000, Ip Wing Kin John wrote:
 (running DTrace tool kit iofile.d script to show I/O wait time by
 filename and process)

Is the dtrace toolkit a viable product for a linux environment or
is it strickly Sun/Oracle?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Tom Lane
Ray Stell ste...@cns.vt.edu writes:
 On Thu, Aug 06, 2009 at 12:50:51PM +1000, Ip Wing Kin John wrote:
 (running DTrace tool kit iofile.d script to show I/O wait time by
 filename and process)

 Is the dtrace toolkit a viable product for a linux environment or
 is it strickly Sun/Oracle?

dtrace is available on Solaris and Mac OS X and probably a couple
other platforms, but not Linux.  For Linux there is SystemTap,
which does largely the same kinds of things but has a different
scripting syntax ...

regards, tom lane

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Ray Stell
On Thu, Aug 06, 2009 at 11:01:52AM -0400, Tom Lane wrote:
 
 dtrace is available on Solaris and Mac OS X and probably a couple
 other platforms, but not Linux.  

I wondered if anyone had given this a go:

 http://amitksaha.blogspot.com/2009/03/dtrace-on-linux.html

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Kenneth Cox
I wasn't able to compile dtrace on either CentOS 5.3 or Fedora 11.  But  
the author is responsive and the problem doesn't look hard to fix.  It  
sits in my inbox awaiting some hacking time...


Ken

On Thu, 06 Aug 2009 11:43:05 -0400, Ray Stell ste...@cns.vt.edu wrote:


On Thu, Aug 06, 2009 at 11:01:52AM -0400, Tom Lane wrote:


dtrace is available on Solaris and Mac OS X and probably a couple
other platforms, but not Linux.


I wondered if anyone had given this a go:

 http://amitksaha.blogspot.com/2009/03/dtrace-on-linux.html





--
Using Opera's revolutionary e-mail client: http://www.opera.com/mail/

--
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Joshua D. Drake
On Thu, 2009-08-06 at 11:57 -0400, Kenneth Cox wrote:
 I wasn't able to compile dtrace on either CentOS 5.3 or Fedora 11.  But  
 the author is responsive and the problem doesn't look hard to fix.  It  
 sits in my inbox awaiting some hacking time...

Why aren't you using systemtap again? As I recall it uses the same
interface as dtrace. The front end is just different.

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Ray Stell
On Thu, Aug 06, 2009 at 09:12:22AM -0700, Joshua D. Drake wrote:
 Why aren't you using systemtap again? 

1. significant solaris responsibilites
2. significant linux responsibilities
3. tool consolidation delusions

Can you drive dtace toolkit via systemtap?

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-06 Thread Joshua D. Drake
On Thu, 2009-08-06 at 12:38 -0400, Ray Stell wrote:
 On Thu, Aug 06, 2009 at 09:12:22AM -0700, Joshua D. Drake wrote:
  Why aren't you using systemtap again? 
 
 1. significant solaris responsibilites

There is your problem right there ;)

 2. significant linux responsibilities
 3. tool consolidation delusions

Hah! I know this one.

 
 Can you drive dtace toolkit via systemtap?
 

I don't know. Tom?

Joshua D. Drake

-- 
PostgreSQL - XMPP: jdr...@jabber.postgresql.org
   Consulting, Development, Support, Training
   503-667-4564 - http://www.commandprompt.com/
   The PostgreSQL Company, serving since 1997


-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


[PERFORM] Bottleneck?

2009-08-05 Thread Ip Wing Kin John
I have a database (699221). It contains of 1.8GB data (707710). I am
doing a complex query. Which required to load a 80MB index (732287).

I restarted Postgresql so the cache is empty and it has to read the
table and index from disk. Which I understand is an expensive process.
But what I don't understand is even I split the index into a different
tablespace located on a completely separate disk (mounted on /hdd2)
there is still a very long I/O wait time. That index is only thing
exist on that disk. Any idea why? Or any way I can find out what it is
waiting for? Thanks.

(running DTrace tool kit iofile.d script to show I/O wait time by
filename and process)

bash-3.00# ./iofile.d
Tracing... Hit Ctrl-C to end.
^C
   PID CMD  TIME FILE
 2379 postgres23273 /export/space/postgres8.3/lib/plpgsql.so
  2224 metacity24403 /lib/libm.so.2
  2379 postgres32345
/export/space/pg_data/pg_data/data/base/699221/2619  2379 postgres
   40992 /export/space/pg_data/pg_data/data/base/699221/2691 0
sched   82205 none
  2379 postgres   273205 /export/space/postgres8.3/bin/postgres
  2379 postgres  1092140 none
  2379 postgres 59461649 /hdd2/indexes/699221/732287

(running DTrace tool kit iofildb.d script to show I/O bytes by
filename and process)

bash-3.00# ./iofileb.d
Tracing... Hit Ctrl-C to end.
^C
   PID CMD  KB FILE
  2379 postgres   8256
/export/space/pg_data/pg_data/data/base/699221/699473  2379 postgres
   87760 /hdd2/indexes/699221/732287
  2379 postgres 832472
/export/space/pg_data/pg_data/data/base/699221/707710.1
  2379 postgres 1048576
/export/space/pg_data/pg_data/data/base/699221/707710





-- 
John

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-05 Thread Scott Marlowe
On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin Johnwkipj...@gmail.com wrote:
 I have a database (699221). It contains of 1.8GB data (707710). I am
 doing a complex query. Which required to load a 80MB index (732287).

 I restarted Postgresql so the cache is empty and it has to read the
 table and index from disk. Which I understand is an expensive process.
 But what I don't understand is even I split the index into a different
 tablespace located on a completely separate disk (mounted on /hdd2)
 there is still a very long I/O wait time. That index is only thing
 exist on that disk. Any idea why? Or any way I can find out what it is
 waiting for? Thanks.

OK before DTrace, did you run explain analyze on the query?  I think
the output of that would be interesting.

Looking at the DTrace output it looks to me like you're reading at
least one  1GB table. since you're accessing a file with a .1 on it.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-05 Thread wkipjohn

Hi Scott,

Yes I did that. And you are right because I restart my machine, so the  
postgres cache is empty. And I think postgresql is reading all 1.8GB of  
data back into the cache when it does a seq scan on the status table.


QUERY PLAN
  
  
---
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual  
time=106045.453..106 078.238 rows=80963 loops=1)

Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
- Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual  
time=41205.683..105523.303 rows=80963 loops=1)

Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
- Nested Loop (cost=360360.86..38.01 rows=3 width=567) (actual  
time=41127.859..105256.069 rows=80963 loops=1)

Join Filter: ((rec.guid)::text = (status.guid)::text)
- HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual  
time=41089.852..41177.137 rows=8 loops=1)
- Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual  
time=36401.247..38505.042 rows=400 loops=1)

Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
- Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32)  
(actual time=1009.161..1029.849 rows=8 loops=1)
- Hash (cost=285135.53..285135.53 rows=362 wid th=16) (actual  
time=35391.697..35391.697 rows=400 loops=1)
- Seq Scan on status (cost=0.00..2 85135.53 rows=362 width=16) (actual  
time=5.095..32820.746 rows=400 loops =1)

Filter: (startdatetime = 1249281281666:: bigint)
- Index Scan using index_status_startdatetime on status rec  
(cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r ows=1  
loops=8)

Index Cond: (rec.startdatetime = (max(status.startdatetime)))
- Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual  
time=0.001..0.001 rows=1 loops=80963)

Total runtime: 106227.356 ms
(18 rows)



On Aug 6, 2009 2:19pm, Scott Marlowe scott.marl...@gmail.com wrote:

On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin johnwkipj...@gmail.com wrote:



 I have a database (699221). It contains of 1.8GB data (707710). I am



 doing a complex query. Which required to load a 80MB index (732287).







 I restarted Postgresql so the cache is empty and it has to read the



 table and index from disk. Which I understand is an expensive process.



 But what I don't understand is even I split the index into a different



 tablespace located on a completely separate disk (mounted on /hdd2)



 there is still a very long I/O wait time. That index is only thing



 exist on that disk. Any idea why? Or any way I can find out what it is



 waiting for? Thanks.





OK before DTrace, did you run explain analyze on the query? I think



the output of that would be interesting.





Looking at the DTrace output it looks to me like you're reading at



least one  1GB table. since you're accessing a file with a .1 on it.




Re: [PERFORM] Bottleneck?

2009-08-05 Thread Scott Marlowe
Could you possibly attach that in plain text format?  Your email
client seems to have eaten any text formatting / indentation.

-- 
Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-performance


Re: [PERFORM] Bottleneck?

2009-08-05 Thread wkipjohn

Is this alright?


QUERY PLAN
  
  
---
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual  
time=106045.453..106 078.238 rows=80963 loops=1)

Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
- Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567)  
(actual time=41205.683..105523.303 rows=80963 loops=1)

Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
- Nested Loop (cost=360360.86..38.01 rows=3 width=567) (actual  
time=41127.859..105256.069 rows=80963 loops=1)

Join Filter: ((rec.guid)::text = (status.guid)::text)
- HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual  
time=41089.852..41177.137 rows=8 loops=1)
- Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual  
time=36401.247..38505.042 rows=400 loops=1)

Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
- Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000  
width=32) (actual time=1009.161..1029.849 rows=8 loops=1)
- Hash (cost=285135.53..285135.53 rows=362 wid th=16) (actual  
time=35391.697..35391.697 rows=400 loops=1)
- Seq Scan on status (cost=0.00..2 85135.53 rows=362 width=16)  
(actual time=5.095..32820.746 rows=400 loops =1)
Filter: (startdatetime - Index Scan using index_status_startdatetime on  
status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r  
ows=1 loops=8)

Index Cond: (rec.startdatetime = (max(status.startdatetime)))
- Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a  
ctual time=0.001..0.001 rows=1 loops=80963)

Total runtime: 106227.356 ms
(18 rows)



On Aug 6, 2009 2:30pm, wkipj...@gmail.com wrote:

Hi Scott,


Yes I did that. And you are right because I restart my machine, so the  
postgres cache is empty. And I think postgresql is reading all 1.8GB of  
data back into the cache when it does a seq scan on the status table.



QUERY PLAN
  
  
---
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual  
time=106045.453..106 078.238 rows=80963 loops=1)

Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
- Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567)  
(actual time=41205.683..105523.303 rows=80963 loops=1)

Join Filter: ((rec.parentguid)::text = (resolve.resolve)::text)
- Nested Loop (cost=360360.86..38.01 rows=3 width=567) (actual  
time=41127.859..105256.069 rows=80963 loops=1)

Join Filter: ((rec.guid)::text = (status.guid)::text)
- HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual  
time=41089.852..41177.137 rows=8 loops=1)
- Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual  
time=36401.247..38505.042 rows=400 loops=1)

Hash Cond: ((getcurrentguids.getcurrentguids)::text = (status.guid)::text)
- Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000  
width=32) (actual time=1009.161..1029.849 rows=8 loops=1)
- Hash (cost=285135.53..285135.53 rows=362 wid th=16) (actual  
time=35391.697..35391.697 rows=400 loops=1)
- Seq Scan on status (cost=0.00..2 85135.53 rows=362 width=16)  
(actual time=5.095..32820.746 rows=400 loops =1)
Filter: (startdatetime - Index Scan using index_status_startdatetime on  
status rec (cost=0.00..8.15 rows=3 width=414) (actual time=0.796..0.797 r  
ows=1 loops=8)

Index Cond: (rec.startdatetime = (max(status.startdatetime)))
- Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a  
ctual time=0.001..0.001 rows=1 loops=80963)

Total runtime: 106227.356 ms
(18 rows)





On Aug 6, 2009 2:19pm, Scott Marlowe scott.marl...@gmail.com wrote:
 On Wed, Aug 5, 2009 at 8:50 PM, Ip Wing Kin johnwkipj...@gmail.com  
wrote:


  I have a database (699221). It contains of 1.8GB data (707710). I am

  doing a complex query. Which required to load a 80MB index (732287).

 

  I restarted Postgresql so the cache is empty and it has to read the

  table and index from disk. Which I understand is an expensive process.

  But what I don't understand is even I split the index into a different

  tablespace located on a completely separate disk (mounted on /hdd2)

  there is still a very long I/O wait time. That index is only thing

  exist on that disk. Any idea why? Or any way I can find out what it is

  waiting for? Thanks.



 OK before DTrace, did you run explain analyze on the query? I think

 the output of that would be interesting.



 Looking at the DTrace output it looks to me like you're reading at

 least one  1GB table. since you're accessing a file with a .1 on it.



Re: [PERFORM] Bottleneck?

2009-08-05 Thread wkipjohn

Sorry post again.

QUERY PLAN
  
  
---
Sort (cost=390162.53..390162.54 rows=3 width=567) (actual  
time=105726.803..105 756.743 rows=80963 loops=1)

Sort Key: rec.startdatetime, rec.id
Sort Method: quicksort Memory: 43163kB
- Nested Loop IN Join (cost=360360.86..390162.51 rows=3 width=567) (actual  
time=41332.430..105220.859 rows=80963 loops=1)

Join Filter: ((rec.acsguid)::text = (resolve.resolve)::text)
- Nested Loop (cost=360360.86..38.01 rows=3 width=567) (actual t  
ime=41252.145..104952.438 rows=80963 loops=1)

Join Filter: ((rec.volumeguid)::text = (dummymediastatus.volumegu id)::text)
- HashAggregate (cost=360360.86..360405.96 rows=3608 width=16) (actual  
time=41212.903..41299.709 rows=8 loops=1)
- Hash Join (cost=335135.05..354817.67 rows=1108637 widt h=16) (actual  
time=36360.938..38540.426 rows=400 loops=1)
Hash Cond: ((getcurrentguids.getcurrentguids)::text =  
(dummymediastatus.volumeguid)::text)
- Function Scan on getcurrentguids (cost=0.00..260 .00 rows=1000 width=32)  
(actual time=977.013..997.404 rows=8 loops=1)
- Hash (cost=285135.53..285135.53 rows=362 wid th=16) (actual  
time=35383.529..35383.529 rows=400 loops=1)
- Seq Scan on dummymediastatus (cost=0.00..2 85135.53 rows=362  
width=16) (actual time=5.081..32821.253 rows=400 loops =1)

Filter: (startdatetime = 1249281281666:: bigint)
- Index Scan using index_dummymediastatus_startdatetime on dumm  
ymediastatus rec (cost=0.00..8.15 rows=3 width=414) (actual  
time=0.791..0.792 r ows=1 loops=8)

Index Cond: (rec.startdatetime = (max(dummymediastatus.star tdatetime)))
- Function Scan on resolve (cost=0.00..260.00 rows=1000 width=32) (a ctual  
time=0.001..0.001 rows=1 loops=80963)

Total runtime: 105906.467 ms
(18 rows)