Re: [PERFORM] problem with pg_statistics

2003-06-27 Thread Andre Schubert
On Thu, 26 Jun 2003 12:03:52 -0400
Tom Lane [EMAIL PROTECTED] wrote:

 Manfred Koizar [EMAIL PROTECTED] writes:
  On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane [EMAIL PROTECTED]
  wrote:
  Try reducing random_page_cost
 
  With index scan cost being more than 25 * seq scan cost, I guess that
  - all other things held equal - even random_page_cost = 1 wouldn't
  help.
 
 Oh, you're right, I was comparing the wrong estimated costs.  Yeah,
 changing random_page_cost won't fix it.
 
  Or there's something wrong with correlation?
 
 That seems like a good bet.  Andre, is this table likely to be
 physically ordered by time_stamp, or nearly so?  If so, do you
 expect that condition to persist, or is it just an artifact of
 a test setup?
 

Sorry forgot the pg_stat query...


SELECT * FROM pg_stats where tablename = 'tbl_traffic' and attname = 'time_stamp';
  tablename  |  attname   | null_frac | avg_width | n_distinct |   

  most_common_vals 

| 
most_common_freqs  |   
  
   
 histogram_bounds   
   
| correlation 
-++---+---++---
---
++-
---
---+-
 tbl_traffic | time_stamp | 0 | 8 | 104009 | {2003-06-03 
19:12:01.059625+02,2003-02-03 19:52:06.666296+01,2003-02-13 
09:59:45.415763+01,2003
-02-28 18:10:28.536399+01,2003-04-11 18:09:42.30363+02,2003-04-26 
20:35:50.110235+02,2003-05-03 11:09:32.991507+02,2003-05-20 
09:53:51.271853+02,2003-05-21 2
0:55:59.155387+02,2003-06-02 02:38:28.823182+02} | 
{0.0013,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001,0.001} | {2002-07-01 
00:00:00+02,2003-02-21 01:59:
46.107696+01,2003-03-11 15:00:37.418521+01,2003-03-26 
18:14:50.028972+01,2003-04-10 13:43:20.75909+02,2003-04-27 
09:03:19.592213+02,2003-05-08 22:35:41.99761
6+02,2003-05-22 15:34:42.932958+02,2003-06-03 00:53:05.870782+02,2003-06-15 
08:45:41.154875+02,2003-06-27 07:18:30.265868+02} |   -0.479749
(1 row)

Thanks, as

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] problem with pg_statistics

2003-06-27 Thread Manfred Koizar
On Fri, 27 Jun 2003 08:07:35 +0200, Andre Schubert
[EMAIL PROTECTED] wrote:
Traffic data are inserted every 5 minutes with the actual datetime
of the transaction, thatswhy the table should be physically order by time_stamp.

So I'd expect a correlation of nearly 1.  Why do your statistics show
a value of -0.479749?  A negative correlation is a sign of descending
sort order, and correlation values closer to 0 indicate poor
correspondence between column values and tuple positions.

Could this be the effect of initial data loading?  Are there any
updates or deletions in your traffic table?

To answer Manfreds questions:
 Andre, what hardware is this running on?  What are the values of
 shared_buffers, random_page_cost, effective_cache_size, ... ?  Could
 you show us the result of
 
  SELECT * FROM pg_stats
   WHERE tablename = tbl_traffic AND attname = time_stamp;
   ^   ^   ^  ^
Oops, these should have been single quotes.  It's too hot here these
days :-)

sort_mem = 32000
shared_buffers = 13000

Personally I would set them to lower values, but if you have good
reasons ...

#effective_cache_size = 1000  # default in 8k pages

This is definitely too low.  With 512MB or more I tend to set this to
ca. 80% of available RAM.  Use top and free to find hints for good
values.

Servus
 Manfred

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] problem with pg_statistics

2003-06-27 Thread Andre Schubert
On Fri, 27 Jun 2003 10:43:01 +0200
Manfred Koizar [EMAIL PROTECTED] wrote:

 On Fri, 27 Jun 2003 08:07:35 +0200, Andre Schubert
 [EMAIL PROTECTED] wrote:
 Traffic data are inserted every 5 minutes with the actual datetime
 of the transaction, thatswhy the table should be physically order by time_stamp.
 
 So I'd expect a correlation of nearly 1.  Why do your statistics show
 a value of -0.479749?  A negative correlation is a sign of descending
 sort order, and correlation values closer to 0 indicate poor
 correspondence between column values and tuple positions.
 
 Could this be the effect of initial data loading?  Are there any
 updates or deletions in your traffic table?
 

We dont make updates the traffic table.
Once a month we delete the all data of the oldest month.
And after that a vacuum full verbose analyze is performed.
Could this cause reordering of the data ?
And should i do a cluster idx_ts tbl_traffic ?

 To answer Manfreds questions:
  Andre, what hardware is this running on?  What are the values of
  shared_buffers, random_page_cost, effective_cache_size, ... ?  Could
  you show us the result of
  
 SELECT * FROM pg_stats
WHERE tablename = tbl_traffic AND attname = time_stamp;
^   ^   ^  ^
 Oops, these should have been single quotes.  It's too hot here these
 days :-)
 

You are so right ... :)


 #effective_cache_size = 1000  # default in 8k pages
 
 This is definitely too low.  With 512MB or more I tend to set this to
 ca. 80% of available RAM.  Use top and free to find hints for good
 values.
 

Ok, i will talk with my coworker ( he is the sysadmin of our machine )
and look if can use such amount of RAM, because there are several other
processes that are running on these machines.
But i will test and report ...

Thanks, as

---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Tom Lane
Andre Schubert [EMAIL PROTECTED] writes:
 i think i need a little help with a problem with pg_statistic.

Try reducing random_page_cost --- although you'd be foolish to set it on
the basis of just a single test query.  Experiment with a few different
tables, and keep in mind that repeated tests will be affected by caching.

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] problem with pg_statistics

2003-06-26 Thread Tom Lane
Manfred Koizar [EMAIL PROTECTED] writes:
 On Thu, 26 Jun 2003 10:08:05 -0400, Tom Lane [EMAIL PROTECTED]
 wrote:
 Try reducing random_page_cost

 With index scan cost being more than 25 * seq scan cost, I guess that
 - all other things held equal - even random_page_cost = 1 wouldn't
 help.

Oh, you're right, I was comparing the wrong estimated costs.  Yeah,
changing random_page_cost won't fix it.

 Or there's something wrong with correlation?

That seems like a good bet.  Andre, is this table likely to be
physically ordered by time_stamp, or nearly so?  If so, do you
expect that condition to persist, or is it just an artifact of
a test setup?

regards, tom lane

---(end of broadcast)---
TIP 8: explain analyze is your friend