Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Gregory Stark
"Tom Lane" <[EMAIL PROTECTED]> writes:

> Decibel! <[EMAIL PROTECTED]> writes:
>> I'm finding this rather interesting report from top on a Debian box...
>
>> Mem:  32945280k total, 32871832k used,73448k free,   247432k buffers
>> Swap:  1951888k total,42308k used,  1909580k free, 30294300k cached
>
>> So how is it that linux thinks that 30G is cached?
>
> Why would you think that a number reported by the operating system has
> something to do with Postgres' shared memory?

I think his question is how can the kernel be using 30G for kernel buffers if
it only has 32G total and 8G of that is taken up by Postgres's shared buffers.

It seems to imply Linux is paging out sysV shared memory. In fact some of
Heikki's tests here showed that Linux would do precisely that.

If your working set really is smaller than shared buffers then that's not so
bad. Those buffers really would be completely idle anyways.

But if your working set is larger than shared buffers and you're just not
thrashing it hard enough to keep it in RAM then it's really bad. The buffer
Linux will choose to page out are precisely those that Postgres will likely
choose shortly as victim buffers, forcing Linux to page them back in just so
Postgres can overwrite them.


-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Csaba Nagy
On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote:
> >> Mem:  32945280k total, 32871832k used,73448k free,   247432k buffers
> >> Swap:  1951888k total,42308k used,  1909580k free, 30294300k cached
> >
> It seems to imply Linux is paging out sysV shared memory. In fact some of
> Heikki's tests here showed that Linux would do precisely that.

But then why is it not reporting that in the "Swap: used" section ? It
only reports 42308k used swap. 

I have a box where I just executed 3x a select count(*) from a table
which has ~5.5 GB size on disk, and the count executed in <4 seconds,
which I take as it is all cached (shared memory is set to 12GB - I use
the box for testing for now, otherwise I would set it far lower because
I have bad experience with setting it more than 1/3 of the available
memory). Top reported at the end of the process:

Mem:  16510724k total, 16425252k used,85472k free,10144k buffers
Swap:  7815580k total,   157804k used,  7657776k free, 15980664k cached

I also watched it during the selects, but it was not significantly
different. So my only conclusion is that the reported "cached" value is
either including the shared memory or is simply wrong... or I just don't
get how linux handles memory.

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Dimitri Fontaine
Hi,

Le Friday 21 September 2007 01:04:01 Decibel!, vous avez écrit :
> I'm finding this rather interesting report from top on a Debian box...

I've read from people in other free software development groups that top/ps 
memory usage outputs are not useful not trustable after all. A more usable 
(or precise or trustworthy) tool seems to be exmap:
  http://www.berthels.co.uk/exmap/

Hope this helps,
-- 
dim

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Gregory Stark
"Csaba Nagy" <[EMAIL PROTECTED]> writes:

> On Fri, 2007-09-21 at 09:03 +0100, Gregory Stark wrote:
>> >> Mem:  32945280k total, 32871832k used,73448k free,   247432k buffers
>> >> Swap:  1951888k total,42308k used,  1909580k free, 30294300k cached
>> >
>> It seems to imply Linux is paging out sysV shared memory. In fact some of
>> Heikki's tests here showed that Linux would do precisely that.
>
> But then why is it not reporting that in the "Swap: used" section ? It
> only reports 42308k used swap. 

Hm, good point.

The other possibility is that Postgres just hasn't even touched a large part
of its shared buffers. 

-- 
  Gregory Stark
  EnterpriseDB  http://www.enterprisedb.com

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Csaba Nagy
On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote:
> The other possibility is that Postgres just hasn't even touched a large part
> of its shared buffers. 
> 

But then how do you explain the example I gave, with a 5.5GB table
seq-scanned 3 times, shared buffers set to 12 GB, and top still showing
almost 100% memory as cached and no SWAP "used" ? In this case you can't
say postgres didn't touch it's shared buffers - or a sequential scan
won't use the shared buffers ?

Cheers,
Csaba.



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


[PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Csaba Nagy
Hi all,

Postgres version: 8.2.4

Tables:

table_a(a bigint, b bigint, primary key(a, b) );

table_b1(b bigint primary key, more columns...);

table_b2(b bigint primary key references table_b1(b), more columns...);

table_b1: 
 ~ 27M rows;
 ~25 more columns;
 width=309 (as reported by explain select *);

table_a:
 ~400M rows;
 - column "b" should reference table_b1, but it does not for performance
reasons (it is an insert only table);
 - column "a" distinct values: 1148
 - has (a, b) as primary key;
 - has no additional columns;

table_b1:
 ~40K rows;
 ~70 more columns;
 width=1788 (as reported by explain select *);

Statistics for the involved columns for each table are attached in files
(to preserve the spacing). They were taken after analyzing the relevant
table (except for table_b2 where I added the "fiddled" statistics first
and then remembered to analyze fresh, resulting in the "non_fiddled"
version, which gives the same result as the fiddled one).

The problem query is:

prepare test_001(bigint) as
SELECT tb.*
FROM table_a ta 
JOIN table_b2 tb ON ta.b=tb.b
WHERE ta.a = $1  
ORDER BY ta.a, ta.b
limit 10;

Explain gives Plan 1 (see attached plans.txt)

If I set enable_hashjoin=off and enable_mergejoin=off, I get Plan 2
(again, see plans.txt).

The difference is a 30x improvement in the second case...
(I actually forgot to account for cache effects, but later rerun the
queries multiple times and the timings are proportional).

Additionally, if I replace table_b2 with table_b1 in the query, I get
Plan 3 (with reasonable execution time) with both enable_hashjoin and
enable_mergejoin on. So there is something which makes table_b2
different from table_b1 for planning purposes, but I could not identify
what that is... they have differences in statistics, but fiddling with
the stats gave me no difference in the plan.

Looking at Plan 2, it looks like the "limit" step is estimating wrongly
it's cost. I guessed that it does that because it thinks the "b" values
selected from table_a for a given "a" span a larger range than the "b"
values in table_b2, because the "b" values in table_b2 are a (relatively
small) subset of the "b" values in table_a. But this is not the case,
the query only gets "a" values for which all the "b" values in table_a
will be found in table_b2. Of course the planner has no way to know
this, but then I think it is not the case, as I tried to copy the
histogram statistics in pg_statistic for the column "b" from the entry
for table_b1 (which contains the whole span of "b" values) to the entry
for table_b2, with no change in the plan.

Just for the record, this query is just a part of a more complex one,
which joins in bigger tables, resulting in even worse performance, but I
tracked it down to refusing the nested loop to be the problem.

Is there anything I could do to convince the planner to use here the
nested loop plan ?

Thanks,
Csaba.

 attname | null_frac | avg_width | n_distinct | 
 most_common_vals   |   
   most_common_freqs  | 
  histogram_bounds  
 | correlation 
-+---+---++-+-+--+-
 a   | 0 | 8 |   1148 | 
{31826743,31855101,31855343,31854918,31856328,31861573,31855122,31855130,31855189,31856426}
 | 
{0.005,0.0047,0.0043,0.004,0.004,0.004,0.0037,0.0037,0.0037,0.0037}
 | 
{31734956,31854364,31854732,31855162,31855660,31857144,31858109,31858965,31859762,31860576,31861566}
 |0.999608 
 b   | 0 | 8 | -1 | 
|   
  | 
{63977,36878147,42247866,42548692,42812320,46992026,51444368,55977972,56607708,59496742,68530614}
|0.602959 

 attname | null_frac | avg_width | n_distinct | most_common_vals | 
most_common_freqs | 

histogram_bounds
 | 
correlation 
-+---+---++--+---+-

Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Heikki Linnakangas
Csaba Nagy wrote:
> On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote:
>> The other possibility is that Postgres just hasn't even touched a large part
>> of its shared buffers. 
> 
> But then how do you explain the example I gave, with a 5.5GB table
> seq-scanned 3 times, shared buffers set to 12 GB, and top still showing
> almost 100% memory as cached and no SWAP "used" ? In this case you can't
> say postgres didn't touch it's shared buffers - or a sequential scan
> won't use the shared buffers ?

Which version of Postgres is this? In 8.3, a scan like that really won't
suck it all into the shared buffer cache. For seq scans on tables larger
than shared_buffers/4, it switches to the bulk read strategy, using only
 a few buffers, and choosing the starting point with the scan
synchronization facility.

-- 
  Heikki Linnakangas
  EnterpriseDB   http://www.enterprisedb.com

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Csaba Nagy
On Fri, 2007-09-21 at 11:59 +0100, Simon Riggs wrote:
> Please re-run everything on clean tables without frigging the stats. We
> need to be able to trust what is happening is normal.

I did, the plan fiddling happened after getting the plans after a fresh
analyze, and I did run the plan again with fresh analyze just before
sending the mail and the plan was the same. In fact I spent almost 2
days playing with the query which is triggering this behavior, until I
tracked it down to this join. Thing is that we have many queries which
rely on this join, so it is fairly important that we understand what
happens there.

> Plan2 sees that b1 is wider, which will require more heap blocks to be
> retrieved. It also sees b1 is less correlated than b2, so again will
> require more database blocks to retrieve. Try increasing
> effective_cache_size.

effective_cach_size is set to ~2.7G, the box has 4G memory. I increased
it now to 3,5G but it makes no difference. I increased it further to 4G,
no difference again.

> Can you plans with/without LIMIT and with/without cursor, for both b1
> and b2?

The limit is unfortunately absolutely needed part of the query, it makes
no sense to try without. If it would be acceptable to do it without the
limit, then it is entirely possible that the plan I get now would be
indeed better... but it is not acceptable.

Thanks,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 12:03 +0200, Csaba Nagy wrote:
> prepare test_001(bigint) as
> SELECT tb.*
> FROM table_a ta 
> JOIN table_b2 tb ON ta.b=tb.b
> WHERE ta.a = $1  
> ORDER BY ta.a, ta.b
> limit 10; 

Please re-run everything on clean tables without frigging the stats. We
need to be able to trust what is happening is normal.

Plan2 sees that b1 is wider, which will require more heap blocks to be
retrieved. It also sees b1 is less correlated than b2, so again will
require more database blocks to retrieve. Try increasing
effective_cache_size.

Can you plans with/without LIMIT and with/without cursor, for both b1
and b2?

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 12:08 +0200, Csaba Nagy wrote:
> On Fri, 2007-09-21 at 10:43 +0100, Gregory Stark wrote:
> > The other possibility is that Postgres just hasn't even touched a large part
> > of its shared buffers. 
> > 
> 
> But then how do you explain the example I gave, with a 5.5GB table
> seq-scanned 3 times, shared buffers set to 12 GB, and top still showing
> almost 100% memory as cached and no SWAP "used" ? In this case you can't
> say postgres didn't touch it's shared buffers - or a sequential scan
> won't use the shared buffers ?

Well, 6.5GB of shared_buffers could be swapped out and need not be
swapped back in to perform those 3 queries.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote:

> > Can you plans with/without LIMIT and with/without cursor, for both b1
> > and b2?
> 
> The limit is unfortunately absolutely needed part of the query

Understood, but not why I asked...

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Csaba Nagy
On Fri, 2007-09-21 at 11:34 +0100, Heikki Linnakangas wrote:
> Which version of Postgres is this? In 8.3, a scan like that really won't
> suck it all into the shared buffer cache. For seq scans on tables larger
> than shared_buffers/4, it switches to the bulk read strategy, using only
>  a few buffers, and choosing the starting point with the scan
> synchronization facility.
> 
This was on 8.1.9 installed via apt-get on Debian 4.1.1-21. In any case
I'm pretty sure linux swaps shared buffers, as I always got worse
performance for shared buffers more than about 1/3 of the memory. But in
that case the output of top is misleading.

Cheers,
Csaba.



---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Csaba Nagy
On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote:
> On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote:
> 
> > > Can you plans with/without LIMIT and with/without cursor, for both b1
> > > and b2?
> > 
> > The limit is unfortunately absolutely needed part of the query
> 
> Understood, but not why I asked...
> 
Well, the same query without limit goes:

dbdop=# explain execute test_001(31855344);
 QUERY
PLAN 

 Sort  (cost=322831.85..322831.94 rows=36 width=1804)
   Sort Key: ta.a, ta.b
   ->  Hash Join  (cost=3365.60..322830.92 rows=36 width=1804)
 Hash Cond: (ta.b = tb.b)
 ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..314541.78 rows=389648 width=16)
   Index Cond: (a = $1)
 ->  Hash  (cost=524.71..524.71 rows=41671 width=1788)
   ->  Seq Scan on table_b2 tb  (cost=0.00..524.71
rows=41671 width=1788)


I'm not sure what you mean without cursor, maybe not using prepare ?
Well we set up the JDBC driver to always prepare the queries, as this
gives us much better worst case plans than when letting postgres see the
parameter values, especially in queries with limit. So I simulate that
when explaining the behavior we see. All our limit queries are for
interactive display, so the worst case is of much higher importance for
us than the mean execution time... unfortunately postgres has a tendency
to take the best mean performance path than avoid worst case, and it is
not easy to convince it otherwise.

Cheers,
Csaba.





---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

2007-09-21 Thread smiley2211

Dennis,

Thanks for your reply.

No, the OLD server are no longer available (decommissioned) - the new
servers are definitely better h\w.

I do not have any queries to EXPLAIN ANALYZE as they are built by the
application and I am not allowed to enable logging on for that server - so
where do I go from here???

I am pretty much trying to make changes in the postgresql.conf file but
don't have a CLUE as to what starting numbers I should be looking at to
change(???)

Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5
hours to run a SELECT with the 'EXPLAIN ANALYZE':


 
QUERY PLAN  
  
   
-
---
 Limit  (cost=100013612.76..299939413.70 rows=1 width=8) (actual
time=10084289.859..10084289.861 rows=1 loops=1)
   ->  Subquery Scan people_consent  (cost=100013612.76..624068438343.99
rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1)
 ->  Append  (cost=100013612.76..624068438312.78 rows=3121
width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1)
   ->  Nested Loop  (cost=100013612.76..100013621.50 rows=2
width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1)
 ->  Unique  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1)
   ->  Sort  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1)
 Sort Key: temp_consent.id
 ->  Unique 
(cost=100013612.71..100013612.73 rows=2 width=36) (actual
time=10084245.195..10084277.468 rows=7292 loops=1)
   ->  Sort 
(cost=100013612.71..100013612.72 rows=2 width=36) (actual
time=10084245.191..10084254.425 rows=7292 loops=1)
 Sort Key: id, daterecorded,
answer
 ->  Append 
(cost=100013515.80..100013612.70 rows=2 width=36) (actual
time=10083991.226..10084228.613 rows=7292 loops=1)
   ->  HashAggregate 
(cost=100013515.80..100013515.82 rows=1 width=36) (actual
time=10083991.223..10083998.046 rows=3666 loops=1)
 ->  Nested Loop 
(cost=10060.61..100013515.80 rows=1 width=36) (actual
time=388.263..10083961.330 rows=3702 loops=1)
   ->  Nested
Loop  (cost=10060.61..100013511.43 rows=1 width=36) (actual
time=388.237..10083897.268 rows=3702 loops=1)
 -> 
Nested Loop  (cost=10060.61..100013507.59 rows=1 width=24) (actual
time=388.209..10083833.870 rows=3702 loops=1)
  
->  Nested Loop  (cost=10060.61..100013504.56 rows=1 width=24) (actual
time=388.173..10083731.122 rows=3702 loops=1)

Join Filter: ("inner".question_answer_id = "outer".id)

->  Nested Loop  (cost=60.61..86.33 rows=1 width=28) (actual
time=13.978..114.768 rows=7430 loops=1)

  
->  Index Scan using answers_answer_un on answers a  (cost=0.00..5.01 rows=1
width=28) (actual time=0.084..0.088 rows=1 loops=1)


Index Cond: ((answer)::text = 'Yes'::text)

  
->  Bitmap Heap Scan on questions_answers qa  (cost=60.61..81.23 rows=7
width=16) (actual time=13.881..87.112 rows=7430 loops=1)


Recheck Cond: ((qa.answer_id = "outer".id) AND (((qa.question_tag)::text =
'consentTransfer'::text) OR ((qa.question_tag)::text = 'share
WithEval'::text)))


->  BitmapAnd  (cost=60.61..60.61 rows=7 width=0) (actual
time=13.198..13.198 rows=0 loops=1)

  
->  Bitmap Index Scan on qs_as_answer_id  (cost=0.00..5.27 rows=649 width=0)
(actual time=9.689..9.689 rows=57804 loops=1)

Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

2007-09-21 Thread Bill Moran
In response to smiley2211 <[EMAIL PROTECTED]>:
> 
> Dennis,
> 
> Thanks for your reply.
> 
> No, the OLD server are no longer available (decommissioned) - the new
> servers are definitely better h\w.

Says who?  I've heard that one before, and I've seen it be false.
Some wonk replaced a 1Ghz system with 1G of RAM and a high-end SCSI
RAID 10 with a new 3ghz server with 4G of ram and a cheapo SATA-based
RAID 5, but doesn't know he was better off with the older system?

That may not apply to you, or it might.  We don't know because you
didn't give us details.

> I do not have any queries to EXPLAIN ANALYZE as they are built by the
> application and I am not allowed to enable logging on for that server - so
> where do I go from here???

Update your resume.  If you're expected to performance tune this system,
but you're not allowed to enable logging and you can't get a look at
the queries, you're going to looking for new employment soon, because
you've been asked to do the impossible.

> I am pretty much trying to make changes in the postgresql.conf file but
> don't have a CLUE as to what starting numbers I should be looking at to
> change(???)
> 
> Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5
> hours to run a SELECT with the 'EXPLAIN ANALYZE':

It's very difficult (if not impossible) to make sense of this output
without the query itself.  It would also be nice if your mail program
didn't mangle the output, as it would save folks having to reconstruct
it.

> 
>   
>
> QUERY PLAN
> 
>
> -
> ---
>  Limit  (cost=100013612.76..299939413.70 rows=1 width=8) (actual
> time=10084289.859..10084289.861 rows=1 loops=1)
>->  Subquery Scan people_consent  (cost=100013612.76..624068438343.99
> rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1)
>  ->  Append  (cost=100013612.76..624068438312.78 rows=3121
> width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1)
>->  Nested Loop  (cost=100013612.76..100013621.50 rows=2
> width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1)
>  ->  Unique  (cost=100013612.76..100013612.77 rows=2
> width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1)
>->  Sort  (cost=100013612.76..100013612.77 rows=2
> width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1)
>  Sort Key: temp_consent.id
>  ->  Unique 
> (cost=100013612.71..100013612.73 rows=2 width=36) (actual
> time=10084245.195..10084277.468 rows=7292 loops=1)
>->  Sort 
> (cost=100013612.71..100013612.72 rows=2 width=36) (actual
> time=10084245.191..10084254.425 rows=7292 loops=1)
>  Sort Key: id, daterecorded,
> answer
>  ->  Append 
> (cost=100013515.80..100013612.70 rows=2 width=36) (actual
> time=10083991.226..10084228.613 rows=7292 loops=1)
>->  HashAggregate 
> (cost=100013515.80..100013515.82 rows=1 width=36) (actual
> time=10083991.223..10083998.046 rows=3666 loops=1)
>  ->  Nested Loop 
> (cost=10060.61..100013515.80 rows=1 width=36) (actual
> time=388.263..10083961.330 rows=3702 loops=1)
>->  Nested
> Loop  (cost=10060.61..100013511.43 rows=1 width=36) (actual
> time=388.237..10083897.268 rows=3702 loops=1)
>  -> 
> Nested Loop  (cost=10060.61..100013507.59 rows=1 width=24) (actual
> time=388.209..10083833.870 rows=3702 loops=1)
>   
> ->  Nested Loop  (cost=10060.61..100013504.56 rows=1 width=24) (actual
> time=388.173..10083731.122 rows=3702 loops=1)
>   
>   
> Join Filter: ("inner".question_answer_id = "outer".id)
>   
>   
> ->  Nested Loop  (cost=60.61..86.33 rows=1 width=28) (actual
> time=13.978..114.768 rows=7430 loops=1)
>   
> 
> ->  Index Scan using answers_answer_un on answers a  (cost=0.00..5.01 rows=1
> width=28) (actual time=0.084..0.088 rows=1 loops=1)
> 

Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

2007-09-21 Thread Jeff Harris
I suffered the same fate when upgrading some time back.  The single biggest
issue for me was that the default 8.X setup changed what had been fast query
plans using indexes to slow plans using sequential scans.  Changing the
random_page_cost in postgresql.conf from 4.0 to 2.0 (which indicates to
Postgres that reading index pages isn't such a big deal, encouraging index
use) solved most of these issues for me.

Jeff

-Original Message-
From: smiley2211 [mailto:[EMAIL PROTECTED] 
Sent: Friday, September 21, 2007 8:14 AM
To: pgsql-performance@postgresql.org
Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!



Dennis,

Thanks for your reply.

No, the OLD server are no longer available (decommissioned) - the new
servers are definitely better h\w.

I do not have any queries to EXPLAIN ANALYZE as they are built by the
application and I am not allowed to enable logging on for that server - so
where do I go from here???

I am pretty much trying to make changes in the postgresql.conf file but
don't have a CLUE as to what starting numbers I should be looking at to
change(???)

Here is the EXPLAIN ANALYZE for the ONE (1) query I do have...it takes 4 - 5
hours to run a SELECT with the 'EXPLAIN ANALYZE':

 

QUERY PLAN

   



-
---
 Limit  (cost=100013612.76..299939413.70 rows=1 width=8) (actual
time=10084289.859..10084289.861 rows=1 loops=1)
   ->  Subquery Scan people_consent  (cost=100013612.76..624068438343.99
rows=3121 width=8) (actual time=10084289.853..10084289.853 rows=1 loops=1)
 ->  Append  (cost=100013612.76..624068438312.78 rows=3121
width=815) (actual time=10084289.849..10084289.849 rows=1 loops=1)
   ->  Nested Loop  (cost=100013612.76..100013621.50 rows=2
width=815) (actual time=10084289.846..10084289.846 rows=1 loops=1)
 ->  Unique  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.817..10084289.817 rows=1 loops=1)
   ->  Sort  (cost=100013612.76..100013612.77 rows=2
width=8) (actual time=10084289.814..10084289.814 rows=1 loops=1)
 Sort Key: temp_consent.id
 ->  Unique 
(cost=100013612.71..100013612.73 rows=2 width=36) (actual
time=10084245.195..10084277.468 rows=7292 loops=1)
   ->  Sort 
(cost=100013612.71..100013612.72 rows=2 width=36) (actual
time=10084245.191..10084254.425 rows=7292 loops=1)
 Sort Key: id, daterecorded,
answer
 ->  Append 
(cost=100013515.80..100013612.70 rows=2 width=36) (actual
time=10083991.226..10084228.613 rows=7292 loops=1)
   ->  HashAggregate 
(cost=100013515.80..100013515.82 rows=1 width=36) (actual
time=10083991.223..10083998.046 rows=3666 loops=1)
 ->  Nested Loop 
(cost=10060.61..100013515.80 rows=1 width=36) (actual
time=388.263..10083961.330 rows=3702 loops=1)
   ->  Nested
Loop  (cost=10060.61..100013511.43 rows=1 width=36) (actual
time=388.237..10083897.268 rows=3702 loops=1)
 -> 
Nested Loop  (cost=10060.61..100013507.59 rows=1 width=24) (actual
time=388.209..10083833.870 rows=3702 loops=1)
  
->  Nested Loop  (cost=10060.61..100013504.56 rows=1 width=24) 
-> (actual
time=388.173..10083731.122 rows=3702 loops=1)
 

Join Filter: ("inner".question_answer_id = "outer".id)
 

->  Nested Loop  (cost=60.61..86.33 rows=1 width=28) (actual
time=13.978..114.768 rows=7430 loops=1)
 

->  Index Scan using answers_answer_un on answers a  (cost=0.00..5.01 
-> rows=1
width=28) (actual time=0.084..0.088 rows=1 loops=1)
 

Index Cond: ((answer)::text = 'Yes'::text)
 

->  Bitmap Heap Scan on questions_answers qa  (cost=60.61..81.23 rows=7
width=16) (actual time=13.881..87.112 rows=7430 loops=1)
 

Recheck Cond: ((qa.answer_id = "outer".id) AND (((qa.question_tag)::text =
'consentTransfer'::text) OR ((qa.question_tag)::text = 'share
WithEval'::text)))
 

->  BitmapAnd  (cost=60.61..60.61 rows=7 width=0) (actual
time=13.198..13.198 rows=0 loops=1)
 

->  Bitmap Index Scan on qs_as_answer_id  (cost=0.00..5.27 rows=649 
-> width=0)
(actual time=9.689..9.689 rows=57804 loops=1)
 

Index Cond: (qa.answer_id = "outer".id)
 

->  BitmapOr  (cost=55.08..55.08 rows=6596 width=0) (actual
time=2.563..2.563 rows=0 loops=1)
 

->  Bitmap Index Scan on qs_as_qtag  (cost=0.00..27.54 rows=3298 
-> width=0)
(actual time=1.923..1.923 r

Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 14:12 +0200, Csaba Nagy wrote:
> On Fri, 2007-09-21 at 12:34 +0100, Simon Riggs wrote:
> > On Fri, 2007-09-21 at 13:29 +0200, Csaba Nagy wrote:
> > 
> > > > Can you plans with/without LIMIT and with/without cursor, for both b1
> > > > and b2?
> > > 
> > > The limit is unfortunately absolutely needed part of the query
> > 
> > Understood, but not why I asked...
> > 
> Well, the same query without limit goes:

OK, thanks.

> I'm not sure what you mean without cursor, maybe not using prepare ?

Sorry, misread that.

===

I think I understand now: The cost of the LIMIT is being applied, but in
slightly the wrong way. The cost of the Nested Loop node is reduced by
the fraction of LIMIT/(number of expected rows), which is only an
approximation of what we're doing. In Plan 2 this leads to the wildly
wrong estimate that each row costs 49,851 cost units to retrieve, which
is about x50 wrong. In Plan 3 that approximation leads to a more
reasonable cost, so this works in Plan 3, but doesn't in Plan 2. 

What we should do is push down the effect of the LIMIT so that the cost
of the Index Scan on ta reflects the fact that it returns only 10 rows.
It correctly expects 388638 rows that match the value requested, but it
is not retrieving all of them. The executor handles the query
efficiently but the cost model doesn't reflect what the executor
actually does and so we pick the wrong plan. Pushing down the LIMIT
would only be possible when LIMIT has a constant value at plan time, but
that seems like most of the time to my eyes.

The plan estimates should look like this for Plan 2 (marked **)

 Limit  (cost=0.00.. rows=10 width=1804)
   ->  Nested Loop  (cost=0.00..X rows=10 width=1804)
 ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..**11.96** rows=**10** width=16)
   Index Cond: (a = $1)
 ->  Index Scan using pk_table_b2 on table_b2 tb
(cost=0.00..3.77 rows=1 width=1788)
   Index Cond: (ta.b = tb.b)

Incidentally, the way out of this is to improve the stats by setting
stats target = 1000 on column a of ta. That will allow the optimizer to
have a better estimate of the tail of the distribution of a, which
should then be more sensibly reflected in the cost of the Index Scan.
That doesn't solve the actual problem, but should help in your case.

Plans copied below for better clarity:


Plan 2:

db> explain analyze execute test_001(31855344);

QUERY
PLAN  
--
 Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual
time=17.729..21.672 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual
time=17.729..21.671 rows=2 loops=1)
 ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..324880.88 rows=388638 width=16) (actual time=0.146..0.198
rows=2 loops=1)
   Index Cond: (a = $1)
 ->  Index Scan using pk_table_b2 on table_b2 tb
(cost=0.00..3.77 rows=1 width=1788) (actual time=10.729..10.731 rows=1
loops=2)
   Index Cond: (ta.b = tb.b)
 Total runtime: 21.876 ms




Plan 3:

db> explain analyze execute test_001(31855344);

QUERY
PLAN  
--
 Limit  (cost=0.00..853.14 rows=10 width=325) (actual
time=20.117..28.104 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..2024323.48 rows=23728 width=325) (actual
time=20.116..28.101 rows=2 loops=1)
 ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..327561.01 rows=388684 width=16) (actual time=0.023..0.027
rows=2 loops=1)
   Index Cond: (a = $1)
 ->  Index Scan using pk_table_b1 on table_b1 tb
(cost=0.00..4.35 rows=1 width=309) (actual time=14.032..14.034 rows=1
loops=2)
   Index Cond: (ta.b = tb.b)
 Total runtime: 28.200 ms


-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Csaba Nagy
[snip]

Ok, I was not able to follow your explanation, it's too deep for me into
what the planner does...

> Incidentally, the way out of this is to improve the stats by setting
> stats target = 1000 on column a of ta. That will allow the optimizer to
> have a better estimate of the tail of the distribution of a, which
> should then be more sensibly reflected in the cost of the Index Scan.
> That doesn't solve the actual problem, but should help in your case.

OK, I can confirm that. I set the statistics target for column "a" on
table_a to 1000, analyzed, and got the plan below. The only downside is
that analyze became quite expensive on table_a, it took 15 minutes and
touched half of the pages... I will experiment with lower settings,
maybe it will work with less than 1000 too.

db> explain analyze execute test_001(31855344);

QUERY
PLAN  
--
 Limit  (cost=0.00..4499.10 rows=10 width=1804) (actual
time=103.566..120.363 rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..344630.97 rows=766 width=1804) (actual
time=103.563..120.359 rows=2 loops=1)
 ->  Index Scan using pk_table_a on table_a ta
(cost=0.00..67097.97 rows=78772 width=16) (actual time=71.965..77.284
rows=2 loops=1)
   Index Cond: (a = $1)
 ->  Index Scan using pk_table_b2 on table_b2 tb
(cost=0.00..3.51 rows=1 width=1788) (actual time=21.526..21.528 rows=1
loops=2)
   Index Cond: (ta.b = tb.b)
 Total runtime: 120.584 ms

Thanks,
Csaba.



---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!

2007-09-21 Thread Dave Dutcher

>From: smiley2211
>Subject: Re: [PERFORM] Upgraded from 7.4 to 8.1.4 QUERIES NOW SLOW!!!
>
>->  Seq Scan on encounters_questions_answers eqa
>(cost=1.00..17608.66 rows=464766 width=8) (actual
>time=0.003..735.934 rows=464766 loop
>s=7430)
  
It looks like enable_seqscan is set to false.  For some reason that might
have worked on 7.4, but I would try turning that back on for 8.1.
Sequential scans aren't always bad, sometimes they are faster than index
scans.  I would first try running the system with all the enable_* settings
on.

If you can't turn on logging its going to be very hard to track down the
problem.  The easiest way to track down a problem normally is to set
log_min_duration to something like 2000ms.  Then Postgres will log all slow
queries.  Then you can run EXPLAIN ANALYZE on the slow queries to find the
problem.

I think Carlos had a good idea when he asked about the encoding on the new
server vs the old.  Does your application use the like keyword to compare
text fields?  If so, you might need to create indexes which use the
text_pattern_ops operator classes.  With unicode postgres cannot use an
index scan for a query like SELECT * FROM foo WHERE name LIKE 'Bob%' unless
there is an index like CREATE INDEX name_index ON foo (name
text_pattern_ops).  However if you are not using like queries, then this is
not your problem.

More on operator classes:
http://www.postgresql.org/docs/8.1/interactive/indexes-opclass.html

Dave


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Denes Daniel
Hi,
I think the query planner is unaware of the possibly best plan in some 
situations. See the test case below:

-- --- --

CREATE TABLE tparent (
  id integer NOT NULL,
  ord integer NOT NULL,
  CONSTRAINT par_pkey_id PRIMARY KEY (id),
  CONSTRAINT par_uniq_ord UNIQUE (ord)
);

CREATE TABLE tchild (
  par_id integer NOT NULL,
  ord integer NOT NULL,
  CONSTRAINT chi_pkey_parid_ord PRIMARY KEY (par_id, ord),
  CONSTRAINT chi_fkey FOREIGN KEY (par_id) REFERENCES tparent(id)
);

INSERT INTO tparent VALUES (1, 3);
INSERT INTO tparent VALUES (2, 1);
INSERT INTO tparent VALUES (3, 4);
INSERT INTO tparent VALUES (4, 5);
INSERT INTO tparent VALUES (5, 2);

INSERT INTO tchild VALUES (1, 2);
INSERT INTO tchild VALUES (1, 1);
INSERT INTO tchild VALUES (2, 1);
INSERT INTO tchild VALUES (2, 3);
INSERT INTO tchild VALUES (2, 2);
INSERT INTO tchild VALUES (3, 1);
INSERT INTO tchild VALUES (3, 2);
INSERT INTO tchild VALUES (4, 1);
INSERT INTO tchild VALUES (5, 2);
INSERT INTO tchild VALUES (5, 1);

ANALYZE tparent;
ANALYZE tchild;

SET enable_seqscan TO false;
SET enable_bitmapscan TO false;
SET enable_hashjoin TO false;
SET enable_mergejoin TO false;
SET enable_sort TO false;

EXPLAIN ANALYZE
SELECT *
FROM tparent JOIN tchild ON tchild.par_id = tparent.id
WHERE tparent.ord BETWEEN 1 AND 4
ORDER BY tparent.ord, tchild.ord;

-- --- --

Sort
(cost=10132.10..10140.10 rows=8 width=16)
(actual time=0.440..0.456 rows=9 loops=1)
Sort Key: tparent.ord, tchild.ord

-> Nested Loop
   (cost=0.00..84.10 rows=8 width=16)
   (actual time=0.179..0.270 rows=9 loops=1)

   -> Index Scan using par_uniq_ord on tparent
  (cost=0.00..20.40 rows=4 width=8)
  (actual time=0.089..0.098 rows=4 loops=1)
  Index Cond: ((ord >= 1) AND (ord <= 4))

   -> Index Scan using chi_pkey_parid_ord on tchild
  (cost=0.00..9.93 rows=2 width=8)
  (actual time=0.023..0.028 rows=2 loops=4)
  Index Cond: (tchild.par_id = "outer".id)

-- --- --

Even though I forced the nested loop plan using both indexes (that 
returns the rows in the correct order), there is a needless sort step on 
the top, consuming half of the time even on such small tables.
Now it's clear why the planner did not choose this plan, why I had to 
force it: because it isn't the best if the sort is still there.

The first time I posted this
( http://archives.postgresql.org/pgsql-general/2007-05/msg01306.php )
and read Tom's answer I was convinced that this is rarely a problem, 
but now I don't think so, since I ran into it for the third time.

Can that sort step somehow be eliminated if the NestLoop's outer 
table is being scanned via a unique index? If not, how can I rewrite my 
indexes/query in such a way that it's still safe (the rows always come in 
the order I want), but I don't have to wait for that needless sort?

I'm using PostgreSQL 8.1.8.


Thanks in advance,
Denes Daniel





Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___
www.t-mobile.hu/mobizin


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Csaba Nagy
> OK, I can confirm that. I set the statistics target for column "a" on
> table_a to 1000, analyzed, and got the plan below. The only downside is
> that analyze became quite expensive on table_a, it took 15 minutes and
> touched half of the pages... I will experiment with lower settings,
> maybe it will work with less than 1000 too.

So, just to finish this up: setting statistics to 100 worked too, and it
has an acceptable impact on analyze. My original (more complicated)
query is working fine now, with visible effects on server load...

Thanks Simon for your help !

Cheers,
Csaba.



---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Tom Lane
Csaba Nagy <[EMAIL PROTECTED]> writes:
> Looking at Plan 2, it looks like the "limit" step is estimating wrongly
> it's cost.

The reason you get a bad plan is that this rowcount estimate is so far
off:

>->  Index Scan using pk_table_a on table_a ta  
> (cost=0.00..324786.18 rows=388532 width=16) (actual time=454.389..460.138 
> rows=2 loops=1)
>  Index Cond: (a = $1)

Raising the stats target helped no doubt because it didn't overestimate
the number of rows so much...

regards, tom lane

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster


Re: [PERFORM] query io stats and finding a slow query

2007-09-21 Thread Kevin Grittner
>>> On Thu, Sep 20, 2007 at  4:36 PM, in message
<[EMAIL PROTECTED]>, "Kamen Stanev"
<[EMAIL PROTECTED]> wrote: 
> 
> Is there a way to find which query is doing large io operations and/or which
> is using cached data and which is reading from disk.
 
A big part of your cache is normally in the OS, which makes that tough.
 
> please share your experience on how do you decide which
> queries to optimize and how to reorganize your database?
 
We base this on two things -- query metrics from our application framework
and user complaints about performance.

> Is there any tools that you use to profile your database.
  
Many people set log_min_duration_statement to get a look at long-running
queries.
 
When you identify a problem query, running it with EXPLAIN ANALYZE in front
will show you the plan with estimated versus actual counts, costs, and time.
This does actually execute the query (unlike EXPLAIN without ANALYZE).
 
-Kevin
 



---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 16:26 +0200, Csaba Nagy wrote:
> [snip]
> 
> Ok, I was not able to follow your explanation, it's too deep for me into
> what the planner does...

I'm thinking that this case is too narrow to do too much with, when I
think about how we might do what I proposed. OTOH this isn't the first
bad plan we've had because we used the index for ordering. There might
be some common link that we can improve upon.

> > Incidentally, the way out of this is to improve the stats by setting
> > stats target = 1000 on column a of ta. That will allow the optimizer to
> > have a better estimate of the tail of the distribution of a, which
> > should then be more sensibly reflected in the cost of the Index Scan.
> > That doesn't solve the actual problem, but should help in your case.
> 
> OK, I can confirm that. I set the statistics target for column "a" on
> table_a to 1000, analyzed, and got the plan below. The only downside is
> that analyze became quite expensive on table_a, it took 15 minutes and
> touched half of the pages... I will experiment with lower settings,
> maybe it will work with less than 1000 too.

Well, we know there are ways of optimizing ANALYZE.

ISTM we should be able to auto-select stats target based upon the shape
of the frequency distribution of the column values. We'd need to make
some calculations about the index cost model, but its probably worth it
for the future.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
I'm doing several tests.
Right now I did a VACUUM FULL ANALYZE in both servers.
In the old one vacuum runs for about 354 seconds and in the new one 59 seconds.

Then I have ran
EXPLAIN ANALYZE
SELECT *
FROM fact_ven_renta fvr, dim_producto_std_producto dpp
WHERE
  fvr.producto_std_producto_sk = dpp.producto_sk

I have found that the plans aren't exactly the same.
This is the plan for the old server:
Hash Join  (cost=449.55..8879.24 rows=136316 width=904) (actual 
time=50.734..1632.491 rows=136316 loops=1)
  Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk)
  ->  Seq Scan on fact_ven_renta fvr  (cost=0.00..6044.16 rows=136316 
width=228) (actual time=0.029..452.716 rows=136316 loops=1)
  ->  Hash  (cost=403.69..403.69 rows=3669 width=676) (actual 
time=50.582..50.582 rows=3669 loops=1)
->  Seq Scan on dim_producto_std_producto dpp  (cost=0.00..403.69 
rows=3669 width=676) (actual time=0.023..19.776 rows=3669 loops=1)
Total runtime: 2022.293 ms

And this is the plan for the new server:
Hash Join  (cost=412.86..9524.13 rows=136316 width=905) (actual 
time=9.421..506.376 rows=136316 loops=1)
  Hash Cond: ("outer".producto_std_producto_sk = "inner".producto_sk)
  ->  Seq Scan on fact_ven_renta fvr  (cost=0.00..6044.16 rows=136316 
width=228) (actual time=0.006..107.318 rows=136316 loops=1)
  ->  Hash  (cost=403.69..403.69 rows=3669 width=677) (actual time=9.385..9.385 
rows=3669 loops=1)
->  Seq Scan on dim_producto_std_producto dpp  (cost=0.00..403.69 
rows=3669 width=677) (actual time=0.003..3.157 rows=3669 loops=1)
Total runtime: 553.619 ms


I see an "outer" join in the plan for the new server. This is weird!!! There 
are the same databases in both servers.
The old one runs this query for about 37 seconds and for the new one for about 
301 seconds.
Why are plans different? May the backup recovery process have had an error in 
the new server when restoring?

I appreciate some help.
Regards Agustin

- Mensaje original 
De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Para: pgsql-performance@postgresql.org
Enviado: miércoles 19 de septiembre de 2007, 14:38:13
Asunto: [PERFORM] Low CPU Usage

Hi all.
Recently I have installed a brand new server with a Pentium IV 3.2 GHz, SATA 
Disk, 2GB of Ram in Debian 4.0r1 with PostgreSQL 8.2.4 (previously a 8.1.9).
I have other similar server with an IDE disk, Red Hat EL 4 and PostgreSQL 8.2.3

I have almost the same postgresql.conf in both servers, but in the new one (I 
have more work_mem than the other one) things go really slow.  I began to 
monitor i/o disk and it's really ok, I have test disk with hdparm and it's 5 
times faster than the IDE one.
Running the same queries in both servers in the new one it envolves almost 4 
minutes instead of 18 seconds in the old one.
Both databases are the same, I have vacuum them and I don't know how to manage 
this issue.
The only weird thing is than in the older server running
 the query it uses 30% of CPU instead of 3 o 5 % of the new one!!!
What's is happening with this server? I upgrade from 8.1.9 to 8.2.4 trying to 
solve this issue but I can't find a solution.

Any ideas?
Regards
Agustin




  
El Mundial de Rugby 2007
Las últimas noticias en Yahoo! Deportes:

http://ar.sports.yahoo.com/mundialderugby





  Los referentes más importantes en compra/ venta de autos se juntaron:
Demotores y Yahoo!
Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/

Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote:
> Csaba Nagy <[EMAIL PROTECTED]> writes:
> > Looking at Plan 2, it looks like the "limit" step is estimating wrongly
> > it's cost.
> 
> The reason you get a bad plan is that this rowcount estimate is so far
> off:

That's true, but its not relevant, since the query would still be fast
even if that estimate was exactly right. With LIMIT 10, it wouldn't
matter how many rows were there as long as there were more than 10. The
true execution cost is limited, the cost model is not.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote:
>> The reason you get a bad plan is that this rowcount estimate is so far
>> off:

> That's true, but its not relevant,

Yes it is --- the reason it wants to use a hashjoin instead of a
nestloop is exactly that it thinks the loop would iterate too many
times.  (Ten is already too many in this case --- if it had estimated
five rows out of the join, it'd have gone with the nestloop, since
the cost estimate difference at the top level is less than 2x.)

regards, tom lane

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 17:36 +0200, Denes Daniel wrote:

> Even though I forced the nested loop plan using both indexes (that 
> returns the rows in the correct order), there is a needless sort step on 
> the top, consuming half of the time even on such small tables.
> Now it's clear why the planner did not choose this plan, why I had to 
> force it: because it isn't the best if the sort is still there.

Ordering by parent, child is fairly common but the variation you've got
here isn't that common. You'd need to make a case considering all the
alternatives; nobody will agree without a balanced case that includes
what is best for everyone.

Your EXPLAIN looks edited. Have you also edited the sort costs? They
look slightly higher than we might expect. Please provide the full
normal EXPLAIN output.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

http://www.postgresql.org/about/donate


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Kevin Grittner
>>> On Fri, Sep 21, 2007 at 12:30 PM, in message
<[EMAIL PROTECTED]>, <[EMAIL PROTECTED]>
wrote: 
 
> This is the plan for the old server:
> Hash Join  (cost=449.55..8879.24 rows=136316 width=904) (actual 
> time=50.734..1632.491 rows=136316 loops=1)
. . .
> Total runtime: 2022.293 ms
 
> And this is the plan for the new server:
> Hash Join  (cost=412.86..9524.13 rows=136316 width=905) (actual 
> time=9.421..506.376 rows=136316 loops=1)
. . .
> Total runtime: 553.619 ms
 
> I see an "outer" join in the plan for the new server. This is weird!!! There 
> are the same databases in both servers.
 
That's just a matter of labeling the tables with role rather than alias.
The plans look the same to me.
 
> The old one runs this query for about 37 seconds and for the new one for 
> about 301 seconds.
 
That's not what it looks like based on the EXPLAIN ANALYZE output.
It looks like run time dropped from two seconds to half a second.
 
It seems as though you either have a network delay delivering the results,
or your application is slow to read them.

Exactly how are you arriving at those timings you're reporting to us?
 
-Kevin
 



---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
I forgot to tell this plan was with Postgres 8.1.9 in the new server with 
postgres 8.2.4 in the new server the plan is the same as with te old one (the 
little difference in rows retrieved is that the database is yesterday snapshot).
This is the plan for the new server with postgres 8.2.4:
Hash Join  (cost=449.55..8846.67 rows=135786 width=904) (actual 
time=10.823..467.746 rows=135786 loops=1)
  Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk)
  ->  Seq Scan on fact_ven_renta fvr  (cost=0.00..6020.86 rows=135786 
width=228) (actual time=0.007..81.268 rows=135786 loops=1)
  ->  Hash  (cost=403.69..403.69 rows=3669 width=676) (actual 
time=10.733..10.733 rows=3669 loops=1)
->  Seq Scan on dim_producto_std_producto dpp  (cost=0.00..403.69 
rows=3669 width=676) (actual time=0.004..2.995 rows=3669 loops=1)
Total runtime: 513.747 ms

This query is running for about 200 seconds, doing dstat I don't see anything 
weird (regards to low cpu usage 2% or 3%) and normal i/o. In the old server I 
have 30% of cpu usage an high i/o and run faster!!!
This is really weird.


- Mensaje original 
De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Para: pgsql-performance@postgresql.org
Enviado: viernes 21 de septiembre de 2007, 14:30:45
Asunto: Re: [PERFORM] Low CPU Usage

I'm doing several tests.
Right now I did a VACUUM FULL ANALYZE in both servers.
In the old one vacuum runs for about 354 seconds and in the new one 59 seconds.

Then I have ran
EXPLAIN ANALYZE
SELECT *
FROM fact_ven_renta fvr, dim_producto_std_producto dpp
WHERE
  fvr.producto_std_producto_sk = dpp.producto_sk

I have found that the plans aren't exactly the same.
This is the plan for the old server:
Hash Join  (cost=449.55..8879.24 rows=136316 width=904) (actual 
time=50.734..1632.491 rows=136316 loops=1)
  Hash Cond: (fvr.producto_std_producto_sk = dpp.producto_sk)
  ->  Seq Scan on fact_ven_renta fvr  (cost=0.00..6044.16 rows=136316
 width=228) (actual time=0.029..452.716 rows=136316 loops=1)
  ->  Hash  (cost=403.69..403.69 rows=3669 width=676) (actual 
time=50.582..50.582 rows=3669 loops=1)
->  Seq Scan on dim_producto_std_producto dpp  (cost=0.00..403.69 
rows=3669 width=676) (actual time=0.023..19.776 rows=3669 loops=1)
Total runtime: 2022.293 ms

And this is the plan for the new server:
Hash Join  (cost=412.86..9524.13 rows=136316 width=905) (actual 
time=9.421..506.376 rows=136316 loops=1)
  Hash Cond: ("outer".producto_std_producto_sk = "inner".producto_sk)
  ->  Seq Scan on fact_ven_renta fvr  (cost=0.00..6044.16 rows=136316 
width=228) (actual time=0.006..107.318 rows=136316 loops=1)
  ->  Hash  (cost=403.69..403.69 rows=3669 width=677) (actual time=9.385..9.385 
rows=3669 loops=1)
   
 ->  Seq Scan on dim_producto_std_producto dpp  (cost=0.00..403.69 rows=3669 
width=677) (actual time=0.003..3.157 rows=3669 loops=1)
Total runtime: 553.619 ms


I see an "outer" join in the plan for the new server. This is weird!!! There 
are the same databases in both servers.
The old one runs this query for about 37 seconds and for the new one for about 
301 seconds.
Why are plans different? May the backup recovery process have had an error in 
the new server when restoring?

I appreciate some help.
Regards Agustin

- Mensaje original 
De: "[EMAIL PROTECTED]" <[EMAIL PROTECTED]>
Para: pgsql-performance@postgresql.org
Enviado: miércoles 19 de septiembre de 2007, 14:38:13
Asunto: [PERFORM] Low CPU Usage

Hi all.
Recently I have installed a brand new server with a Pentium IV 3.2 GHz, SATA 
Disk, 2GB of Ram in Debian 4.0r1 with PostgreSQL 8.2.4 (previously a 8.1.9).
I have other similar server with an IDE disk, Red Hat EL 4 and PostgreSQL 8.2.3

I have almost the same postgresql.conf in both servers, but in the new one (I 
have more work_mem than the other one) things go really slow.  I began to 
monitor i/o disk and it's really ok, I have test disk with hdparm and it's 5 
times faster than the IDE one.
Running the same queries in both servers in the new one it envolves almost 4 
minutes instead of 18 seconds in the old one.
Both databases are the same, I have vacuum them and I don't know how to manage 
this issue.
The only weird thing is than in the older server running
 the query it uses 30% of CPU instead of 3 o 5 % of the new one!!!
What's is happening with this server? I upgrade from 8.1.9 to 8.2.4 trying to 
solve this issue but I can't find a solution.

Any ideas?
Regards
Agustin




  
El Mundial de Rugby 2007
Las últimas noticias en Yahoo! Deportes:

http://ar.sports.yahoo.com/mundialderugby








  
Los referentes más importantes en compra/venta de autos se juntaron:
Demotores y Yahoo!.
Ahora comprar o vender tu auto es más fácil. 
 Visitá http://ar.autos.yahoo.com/





  Los referentes más importantes en compra/ venta de autos se juntaron:
Demotores y Yahoo!
Ahora comprar o vender tu auto es más fácil. Vistá ar.autos.yahoo.com/

Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
> That's not what it looks like based on the EXPLAIN ANALYZE output.
> It looks like run time dropped from two seconds to half a second.
 
> It seems as though you either have a network delay delivering the results,
> or your application is slow to read them.

> Exactly how are you arriving at those timings you're reporting to us?
 
I have noticed this in a daly process I run which involves normally 45 minutes 
and with the new server takes 1:40.

Some days ago I began to do some tests with no success, then I opened PgAdmin 
with this simply query to read 2 big tables and then compare disk access.
SELECT *
FROM fact_ven_renta fvr, dim_producto_std_producto dpp
WHERE
  fvr.producto_std_producto_sk = dpp.producto_sk
 
fact_ven_renta has 136316 rows
dim_producto_std_producto has 3669 rows



I have made all possible combinations pgadmin (running in the same server each 
query, in the old one, in the new one), without difference  and I only retrieve 
the first 100 records (I didn't count the network time in any case).
But the weird thing is running the query in the new server the are many disk 
access and cpu usage. And with other applications in the same server are a lot 
of disks access.






  Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 
2007.
http://ar.sports.yahoo.com/mundialderugby

Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Bill Moran
In response to [EMAIL PROTECTED]:

> > That's not what it looks like based on the EXPLAIN ANALYZE output.
> > It looks like run time dropped from two seconds to half a second.
>  
> > It seems as though you either have a network delay delivering the results,
> > or your application is slow to read them.
> 
> > Exactly how are you arriving at those timings you're reporting to us?
>  
> I have noticed this in a daly process I run which involves normally 45 
> minutes and with the new server takes 1:40.
> 
> Some days ago I began to do some tests with no success, then I opened PgAdmin 
> with this simply query to read 2 big tables and then compare disk access.
> SELECT *
> FROM fact_ven_renta fvr, dim_producto_std_producto dpp
> WHERE
>   fvr.producto_std_producto_sk = dpp.producto_sk
>  
> fact_ven_renta has 136316 rows
> dim_producto_std_producto has 3669 rows

Run the tests from psql on the same server.  As Kevin pointed out, the
_server_ is faster, but it appears as if the connection between PGadmin
and this new server is slower somehow.

Are you sure of your speed/duplex settings on the network side?  That's
the most common cause of this kind of thing in my experience.  Try doing
a raw FTP transfer between the client and server and see if you get the
speed you should.

> 
> 
> 
> I have made all possible combinations pgadmin (running in the same server 
> each query, in the old one, in the new one), without difference  and I only 
> retrieve the first 100 records (I didn't count the network time in any case).
> But the weird thing is running the query in the new server the are many disk 
> access and cpu usage. And with other applications in the same server are a 
> lot of disks access.



-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Simon Riggs
On Fri, 2007-09-21 at 13:53 -0400, Tom Lane wrote:
> Simon Riggs <[EMAIL PROTECTED]> writes:
> > On Fri, 2007-09-21 at 12:08 -0400, Tom Lane wrote:
> >> The reason you get a bad plan is that this rowcount estimate is so far
> >> off:
> 
> > That's true, but its not relevant,
> 
> Yes it is --- the reason it wants to use a hashjoin instead of a
> nestloop is exactly that it thinks the loop would iterate too many
> times.  (Ten is already too many in this case --- if it had estimated
> five rows out of the join, it'd have gone with the nestloop, since
> the cost estimate difference at the top level is less than 2x.)

That's not my perspective. If the LIMIT had been applied accurately to
the cost then the hashjoin would never even have been close to the
nested join in the first place. It's just chance that the frequency
distribution is favourable to us and thus amenable to using the hint of
improving stats_target. The improved knowledge of the distribution just
hides the fact that the cost model is still wrong: a cost of 45000 per
row shows this.

-- 
  Simon Riggs
  2ndQuadrant  http://www.2ndQuadrant.com


---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
>> > That's not what it looks like based on the EXPLAIN ANALYZE output.
>> > It looks like run time dropped from two seconds to half a second.
>>  
>> > It seems as though you either have a network delay delivering the results,
>> > or your application is slow to read them.
>> 
>> > Exactly how are you arriving at those timings you're reporting to us?
>>  
>> I have noticed this in a daly process I run which involves normally 45 
>> minutes and with the new server takes 1:40.
>> 
>> Some days ago I began to do some tests with no success, then I opened 
>> PgAdmin with this simply query to read 2 big tables and then compare disk 
>> access.
>> SELECT *
>> FROM fact_ven_renta fvr, dim_producto_std_producto dpp
>> WHERE
>>   fvr.producto_std_producto_sk = dpp.producto_sk
>>  
>> fact_ven_renta has 136316 rows
>> dim_producto_std_producto has 3669 rows

>Run the tests from psql on the same server.  As Kevin pointed out, the 
>_server_ is faster, but it appears as if the connection between PGadmin and 
>this new server is slower somehow.

It runs quickly!!! But I don't know how to compare because looks like it 
retrieve fields by demand, when I put ctrl+end (go to the last record) it use a 
lot of CPU and disk, run quickly anyway.
Correct me if am I wrong but, executing PgAdmin in the same server there aren't 
networks delays!
And when the server is processing the query there isn't network traffic because 
is processing the result.

> Are you sure of your speed/duplex settings on the network side?  That's
> the most common cause of this kind of thing in my experience.  Try doing
> a raw FTP transfer between the client and server and see if you get the
> speed you should.
This isn't a dedicated database server, client application and server are 
running in the same machine!!!
I have stop the client application too with same results.

Anyway I will do some network test to find a solution.








  Seguí de cerca a la Selección Argentina de Rugby en el Mundial de Francia 
2007.
http://ar.sports.yahoo.com/mundialderugby

Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Bill Moran
In response to [EMAIL PROTECTED]:

> >> > That's not what it looks like based on the EXPLAIN ANALYZE output.
> >> > It looks like run time dropped from two seconds to half a second.
> >>  
> >> > It seems as though you either have a network delay delivering the 
> >> > results,
> >> > or your application is slow to read them.
> >> 
> >> > Exactly how are you arriving at those timings you're reporting to us?
> >>  
> >> I have noticed this in a daly process I run which involves normally 45 
> >> minutes and with the new server takes 1:40.
> >> 
> >> Some days ago I began to do some tests with no success, then I opened 
> >> PgAdmin with this simply query to read 2 big tables and then compare disk 
> >> access.
> >> SELECT *
> >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp
> >> WHERE
> >>   fvr.producto_std_producto_sk = dpp.producto_sk
> >>  
> >> fact_ven_renta has 136316 rows
> >> dim_producto_std_producto has 3669 rows
> 
> >Run the tests from psql on the same server.  As Kevin pointed out, the 
> >_server_ is faster, but it appears as if the connection between PGadmin and 
> >this new server is slower somehow.
> 
> It runs quickly!!! But I don't know how to compare because looks like it 
> retrieve fields by demand, when I put ctrl+end (go to the last record) it use 
> a lot of CPU and disk, run quickly anyway.

That's pretty odd.  If you use \timing in psql, you can get execution
time for each query, if it helps you track things down.

> Correct me if am I wrong but, executing PgAdmin in the same server there 
> aren't networks delays!

Not network, no.  But the results of your explains seem to show that the
query is executing much faster on the new system than the old, so the
problem still becomes, "what is happening after the query completes that
is so slow?"  It's just that networking is ruled out.


-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Linux mis-reporting memory

2007-09-21 Thread Greg Smith

On Thu, 20 Sep 2007, Decibel! wrote:

I'm finding this rather interesting report from top on a Debian box... 
how is it that linux thinks that 30G is cached?


top on Linux gives weird results when faced with situations where there's 
shared memory involved.  I look at /proc/meminfo and run ipcs when I want 
a better idea what's going on.  As good of an article on this topic as 
I've found is http://gentoo-wiki.com/FAQ_Linux_Memory_Management which 
recommends using free to clarify how big the disk cache really is.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Denes Daniel
Simon Riggs <[EMAIL PROTECTED]> írta:

> Ordering by parent, child is fairly common but the variation you've got
> here isn't that common. You'd need to make a case considering all the
> alternatives; nobody will agree without a balanced case that includes
> what is best for everyone.
> 
> Your EXPLAIN looks edited. Have you also edited the sort costs? They
> look slightly higher than we might expect. Please provide the full
> normal EXPLAIN output.
> 
> -- 
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com



I've just inserted some newlines, so it's better to read than when my 
email-client wraps the lines automatically. Did not touch the information 
itself. But here is the normal output of EXPLAIN ANALYZE:

EXPLAIN ANALYZE SELECT * FROM tparent JOIN tchild ON tchild.par_id = 
tparent.id WHERE tparent.ord BETWEEN 1 AND 4 ORDER BY tparent.ord, 
tchild.ord;

  QUERY PLAN
---

 Sort  (cost=10132.10..10140.10 rows=8 width=16) (actual 
time=0.302..0.319 rows=9 loops=1)
   Sort Key: tparent.ord, tchild.ord
   ->  Nested Loop  (cost=0.00..84.10 rows=8 width=16) (actual 
time=0.181..0.267 rows=9 loops=1)
 ->  Index Scan using par_uniq_ord on tparent  (cost=0.00..20.40 
rows=4 width=8) (actual time=0.100..0.109 rows=4 loops=1)
   Index Cond: ((ord >= 1) AND (ord <= 4))
 ->  Index Scan using chi_pkey_parid_ord on tchild  
(cost=0.00..9.93 rows=2 width=8) (actual time=0.020..0.026 rows=2 
loops=4)
   Index Cond: (tchild.par_id = "outer".id)
 Total runtime: 0.412 ms
(8 rows)

The costs may be different because I've tuned the query planner's 
parameters.

> Ordering by parent, child is fairly common but the variation you've got
> here isn't that common.
How else can you order by parent, child other than first ordering by a 
unique key of parent, then something in child? (Except for 
child.parent_id, child.something because this has all the information in 
child and can rely on a single multicolumn index.)


Denes Daniel






Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___
www.t-mobile.hu/mobizin


---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
> > >> > That's not what it looks like based on the EXPLAIN ANALYZE output.
> > >> > It looks like run time dropped from two seconds to half a second.
> > >>  
> > >> > It seems as though you either have a network delay delivering the 
> > >> > results,
> > >> > or your application is slow to read them.
> > >> 
> > >> > Exactly how are you arriving at those timings you're reporting to us?
> > >>  
> > >> I have noticed this in a daly process I run which involves normally 45 
> > >> minutes and with the new server takes 1:40.
> > >> 
> > >> Some days ago I began to do some tests with no success, then I opened 
> > >> PgAdmin with this simply query to read 2 big tables and then compare 
> > >> disk access.
> > >> SELECT *
> > >> FROM fact_ven_renta fvr, dim_producto_std_producto dpp
> > >> WHERE
> > >>   fvr.producto_std_producto_sk = dpp.producto_sk
> > >>  
> > >> fact_ven_renta has 136316 rows
> > >> dim_producto_std_producto has 3669 rows
 >> 
> > >Run the tests from psql on the same server.  As Kevin pointed out, the 
> > >_server_ is faster, but it appears as if the connection between PGadmin 
> > >and this new server is slower somehow.
> > 
> > It runs quickly!!! But I don't know how to compare because looks like it 
> > retrieve fields by demand, when I put ctrl+end (go to the last record) it 
> > use a lot of CPU and disk, run quickly anyway.

> That's pretty odd.  If you use \timing in psql, you can get execution
> time for each query, if it helps you track things down.

Yes, in the new server running with \timing it consumes 5.6 seconds and in the 
old server it consumes 25 seconds.

> > Correct me if am I wrong but, executing PgAdmin in the same server there 
> > aren't networks delays!

> Not network, no.  But the results of your explains seem to show that the
> query is executing much faster on the new system than the old, so the
> problem still becomes, "what is happening after the query completes that
> is so slow?"  It's just that networking is ruled out.

Is connected to full 100Mb, it transfers many things quick to clients. Is 
running Apache adn JBoss, transfer rate is good, I did scp to copy many 
archives and is as quick as the old server.

I have no idea how to continue researching this problem. Now I'm going to do 
some networks tests.


-- 
Bill Moran
Collaborative Fusion Inc.
http://people.collaborativefusion.com/~wmoran/

[EMAIL PROTECTED]
Phone: 412-422-3463x4023







  Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! 
Deportes. ¡Conocelas!
http://ar.sports.yahoo.com/mundialderugby

Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Alvaro Herrera
Denes Daniel wrote:

> I've just inserted some newlines, so it's better to read than when my 
> email-client wraps the lines automatically. Did not touch the information 
> itself. But here is the normal output of EXPLAIN ANALYZE:

The best thing to do is paste them in a text file and send it as an
attachment.

>   QUERY PLAN
> ---
> 
>  Sort  (cost=10132.10..10140.10 rows=8 width=16) (actual 
> time=0.302..0.319 rows=9 loops=1)
>Sort Key: tparent.ord, tchild.ord
>->  Nested Loop  (cost=0.00..84.10 rows=8 width=16) (actual 
> time=0.181..0.267 rows=9 loops=1)
>  ->  Index Scan using par_uniq_ord on tparent  (cost=0.00..20.40 
> rows=4 width=8) (actual time=0.100..0.109 rows=4 loops=1)
>Index Cond: ((ord >= 1) AND (ord <= 4))
>  ->  Index Scan using chi_pkey_parid_ord on tchild  
> (cost=0.00..9.93 rows=2 width=8) (actual time=0.020..0.026 rows=2 
> loops=4)
>Index Cond: (tchild.par_id = "outer".id)
>  Total runtime: 0.412 ms
> (8 rows)
> 
> The costs may be different because I've tuned the query planner's 
> parameters.

Why did you set enable_sort=off?  It's not like sorting 9 rows is going
to take any noticeable amount of time anyway.

-- 
Alvaro Herrerahttp://www.advogato.org/person/alvherre
"No hay hombre que no aspire a la plenitud, es decir,
la suma de experiencias de que un hombre es capaz"

---(end of broadcast)---
TIP 4: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Denes Daniel
In reply to Alvaro Herrera:

> The best thing to do is paste them in a text file and send it as an
> attachment.

Okay, it's attached.

> Why did you set enable_sort=off?  It's not like sorting 9 rows is going
> to take any noticeable amount of time anyway.

Of course it's no problem for 9 rows, but this is only a test case. In 
production there will be much more. I just wanted to show that the 
planner doesn't even consider a plan without a sort step, using purely 
index scans.


Denes Daniel





Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___
www.t-mobile.hu/mobizin
EXPLAIN ANALYZE
SELECT *
FROM tparent JOIN tchild ON tchild.par_id = tparent.id
WHERE tparent.ord BETWEEN 1 AND 4
ORDER BY tparent.ord, tchild.ord;

  QUERY PLAN
---
 Sort  (cost=10132.10..10140.10 rows=8 width=16) (actual 
time=0.302..0.319 rows=9 loops=1)
   Sort Key: tparent.ord, tchild.ord
   ->  Nested Loop  (cost=0.00..84.10 rows=8 width=16) (actual 
time=0.181..0.267 rows=9 loops=1)
 ->  Index Scan using par_uniq_ord on tparent  (cost=0.00..20.40 rows=4 
width=8) (actual time=0.100..0.109 rows=4 loops=1)
   Index Cond: ((ord >= 1) AND (ord <= 4))
 ->  Index Scan using chi_pkey_parid_ord on tchild  (cost=0.00..9.93 
rows=2 width=8) (actual time=0.020..0.026 rows=2 loops=4)
   Index Cond: (tchild.par_id = "outer".id)
 Total runtime: 0.412 ms
(8 rows)
---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Dave Dutcher

>From:  [EMAIL PROTECTED]
>Subject: Re: [PERFORM] Low CPU Usage
>
>I have no idea how to continue researching this problem. Now I'm going to
do some networks tests.


I would go back to the slow program and try to capture the slow queries in
the log file.  Once you have some queries which are running slow then you
can run EXPLAIN ANALYZE to see what the bottle neck is.

It seems like you've found pgAdmin is slow sending across the network, but
we don't know if that has anything to do with your original problems.

Just my 2 cents.

Dave


---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Luiz K. Matsumura



[EMAIL PROTECTED] wrote:


> That's pretty odd.  If you use \timing in psql, you can get execution
> time for each query, if it helps you track things down.

Yes, in the new server running with \timing it consumes 5.6 seconds 
and in the old server it consumes 25 seconds.


> > Correct me if am I wrong but, executing PgAdmin in the same server 
there aren't networks delays!


> Not network, no.  But the results of your explains seem to show that the
> query is executing much faster on the new system than the old, so the
> problem still becomes, "what is happening after the query completes that
> is so slow?"  It's just that networking is ruled out.

Is connected to full 100Mb, it transfers many things quick to clients. 
Is running Apache adn JBoss, transfer rate is good, I did scp to copy 
many archives and is as quick as the old server.


I have no idea how to continue researching this problem. Now I'm going 
to do some networks tests.




See if this can give some help to you:
I was experienced some problems with networks with win98 and winXP 
stations, the application was running with good performance almost of 
the time,
but in suddenly the performance slow down. We noticed that the problem 
was with the time to connect with the server, that was very slow.

The problem occurs also when the internet link down.
Well, I don't know why but when we exclude win98 stations from network, 
the problem disappears.
I think that was some DNS problem (but not sure), because one time we 
cleared nameserver clauses in the /etc/resolv.conf the performance 
return to the normal.
But we reinstalled win98 machines with winXP too, so I don't know what 
happened exactly.
The server OS was a Mandriva Linux running postgres ( 8.0, I guess) and 
samba. Workstations connect via odbc (informing the IP of server or the 
name to connect the problem persists).



--
Luiz K. Matsumura
Plan IT Tecnologia Informática Ltda.


---(end of broadcast)---
TIP 4: Have you searched our list archives?

  http://archives.postgresql.org


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Craig James

Luiz K. Matsumura wrote:
Is connected to full 100Mb, it transfers many things quick to clients. 
Is running Apache adn JBoss, transfer rate is good, I did scp to copy 
many archives and is as quick as the old server.


I have no idea how to continue researching this problem. Now I'm going 
to do some networks tests.


Any chance this is your desktop machine, and you're also using it for audio?  
Microsoft built in a feature (!) that reduces network speed by 90% when music 
is playing:

 http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss

Craig

---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
  choose an index scan if your joining column's datatypes do not
  match


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Kevin Grittner
>>> On Fri, Sep 21, 2007 at  3:40 PM, in message <[EMAIL PROTECTED]>,
"Luiz K. Matsumura" <[EMAIL PROTECTED]> wrote: 
 
> but in suddenly the performance slow down. We noticed that the problem 
> was with the time to connect with the server, that was very slow.
 
> I think that was some DNS problem (but not sure), because one time we 
> cleared nameserver clauses in the /etc/resolv.conf the performance 
> return to the normal.
 
You may have it there.  In some versions of Java, on Windows, connection
times are horribly slow unless the machine's IP address has a reverse
DNS entry.  Perhaps the new machine lacks such an entry, or there's a
different version of Java in use?
 
-Kevin
 



---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread brauagustin-susc
It's a Debian 4.0r1 server without sound (alsa is disabled), I'm running the 
querys locally.
It's happening running the query locally with PgAdmin and by jdbc locally too.
Yes I have win98, XP machines on my network, I will unplugged from the net and 
test again. On monday I'll give you my answer.
Last thing I did was disabling ipv6 and with the same results.

Thank you very much for your help.


- Mensaje original 
De: Craig James <[EMAIL PROTECTED]>
Para: Luiz K. Matsumura <[EMAIL PROTECTED]>
CC: [EMAIL PROTECTED]; pgsql-performance@postgresql.org
Enviado: viernes 21 de septiembre de 2007, 18:15:40
Asunto: Re: [PERFORM] Low CPU Usage

Luiz K. Matsumura wrote:
>> Is connected to full 100Mb, it transfers many things quick to clients. 
>> Is running Apache adn JBoss, transfer rate is good, I did scp to copy 
>> many archives and is as quick as the old server.
>>
>> I have no idea how to continue researching this problem. Now I'm going 
>> to do some networks tests.

Any chance this is your desktop machine, and you're also using it for audio?  
Microsoft built in a feature (!) that reduces network speed by 90% when music 
is playing:

  http://it.slashdot.org/article.pl?sid=07/08/26/1628200&from=rss

Craig







  Las últimas noticias sobre el Mundial de Rugby 2007 están en Yahoo! 
Deportes. ¡Conocelas!
http://ar.sports.yahoo.com/mundialderugby

Re: [PERFORM] Searching for the cause of a bad plan

2007-09-21 Thread Tom Lane
Simon Riggs <[EMAIL PROTECTED]> writes:
> That's not my perspective. If the LIMIT had been applied accurately to
> the cost then the hashjoin would never even have been close to the
> nested join in the first place.

[ shrug... ]  Your perspective is mistaken.  There is nothing wrong with
the way the LIMIT estimation is being done.  The plan in question was

Limit  (cost=0.00..498511.80 rows=10 width=1804) (actual time=17.729..21.672 
rows=2 loops=1)
   ->  Nested Loop  (cost=0.00..1794642.48 rows=36 width=1804) (actual 
time=17.729..21.671 rows=2 loops=1)
 ->  Index Scan using pk_table_a on table_a ta  (cost=0.00..324880.88 
rows=388638 width=16) (actual time=0.146..0.198 rows=2 loops=1)
   Index Cond: (a = $1)
 ->  Index Scan using pk_table_b2 on table_b2 tb  (cost=0.00..3.77 
rows=1 width=1788) (actual time=10.729..10.731 rows=1 loops=2)
   Index Cond: (ta.b = tb.b)
 Total runtime: 21.876 ms

and there are two fairly serious estimation errors here, neither related
at all to the LIMIT:

* five-orders-of-magnitude overestimate of the number of table_a rows
that will match the condition on a;

* enormous underestimate of the number of join rows --- it's apparently
thinking only 0.0001 of the table_a rows will have a join partner,
whereas at least for this case they all do.

Had the latter estimate been right, the cost of pulling results this
way would indeed have been something like 50K units per joined row,
because of the large number of inner index probes per successful join.

It might be interesting to look into why those estimates are so far
off; the stats Csaba displayed don't seem to have any obvious oddity
that would justify such bizarre results.  But the LIMIT has got
nothing to do with this.

regards, tom lane

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Denes Daniel
Simon Riggs <[EMAIL PROTECTED]> wrote:

> On Fri, 2007-09-21 at 21:20 +0200, Dániel Dénes wrote:
> 
> > The costs may be different because I've tuned the query planner's 
> > parameters.
> 
> OK, understood.
> 
> > > Ordering by parent, child is fairly common but the variation you've
> > > got here isn't that common.
> > How else can you order by parent, child other than first ordering by
> > a unique key of parent, then something in child? (Except for 
> > child.parent_id, child.something because this has all the
> > information in child and can rely on a single multicolumn index.)
> 
> Why "except"? Whats wrong with ordering that way? 
> 
> Make the case. **I** want it is not sufficient...
> 
> -- 
>   Simon Riggs
>   2ndQuadrant  http://www.2ndQuadrant.com




In reply to Simon Riggs <[EMAIL PROTECTED]>:

> > How else can you order by parent, child other than first ordering by
> > a unique key of parent, then something in child? (Except for 
> > child.parent_id, child.something because this has all the
> > information in child and can rely on a single multicolumn index.)
> 
> Why "except"? Whats wrong with ordering that way?

Well, nothing, but what if I have to order by some other unique key? Of 
course I could do that by redundantly storing the parent's data in child 
and then creating a multicolumn index, but...

Just to see clear: when I found this, I was trying to make a slightly 
different query. It was like:

SELECT *
FROM tparent JOIN tchild ON tchild.par_id = tparent.id
WHERE tparent.uniqcol1 = 123
ORDER BY tparent.uniqcol2, tchild.ord;

where there was a unique index on (tparent.uniqcol1, tparent.uniqcol2) 
and the columns are marked NOT NULL.
I expected a plan like doing an index scan on parent.uniqcol2 where 
uniqcol1 = 123, and (using a nestloop and child's pkey) joining in the 
children in the correct order (without a sort). But I got something else, 
so I tried everything to get what I wanted -- just to see the costs why 
the planner chose something else. After some time I found out that 
there is no such plan, so no matter what I do it will sort...
So that's how I got here. But since the original problem isn't that clean 
& simple, I thought I'd make a test case, that's easy to follow, and 
illustrates the problem: that the planner doesn't even consider my 
plan. If it did, I think that'd be the one that gets executed. But tell me if 
I'm wrong somewhere.



> Make the case. **I** want it is not sufficient...

Sorry, I can't understand that... I'm far from perfect in english. Please 
clarify so I can do what you ask me to.


Denes Daniel
-




Olvasd az [origo]-t a mobilodon: mini magazinok a Mobizin-en
___
www.t-mobile.hu/mobizin


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] Query planner unaware of possibly best plan

2007-09-21 Thread Tom Lane
Denes Daniel <[EMAIL PROTECTED]> writes:
> Simon Riggs <[EMAIL PROTECTED]> wrote:
>> Make the case. **I** want it is not sufficient...

> Sorry, I can't understand that... I'm far from perfect in english.

The point here is that you've repeated the same example N times without
actually making a case that it's interesting to support.  We have to
think about the intellectual complexity that would be added to the
planner to support this case, and the cycles that would be expended
on every query (and wasted, for most queries) on trying to detect
whether the case applies.  If it were simple and cheap to do, these
arguments wouldn't hold much weight, but it doesn't look to me like
either is the case.

Another problem is that it's not clear there's much to be gained.
Avoiding the sort step is only interesting if the query produces so many
rows that a sort would be expensive ... but if that's the case, it seems
unlikely that a nestloop indexscan plan would be the best choice anyway.

So basically this looks like a lot of work for a narrow and questionable
gain.  If you want it to happen you need to convince people that it's
easier and more useful than it looks.

regards, tom lane

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] Low CPU Usage

2007-09-21 Thread Greg Smith

On Thu, 20 Sep 2007, [EMAIL PROTECTED] wrote:

Which other test can I do to find if this is a hardware, kernel o 
postgres issue?


The little test hdparm does is not exactly a robust hard drive benchmark. 
If you want to rule out hard drive transfer speed issues, take at look at 
the tests suggested at 
http://www.westnet.com/~gsmith/content/postgresql/pg-disktesting.htm and 
see how your results compare to the single SATA disk example I give there.


--
* Greg Smith [EMAIL PROTECTED] http://www.gregsmith.com Baltimore, MD

---(end of broadcast)---
TIP 7: You can help support the PostgreSQL project by donating at

   http://www.postgresql.org/about/donate