Re: [PERFORM] pg_reset_stats + cache I/O %

2006-03-08 Thread mcelroy, tim
Title: RE: [PERFORM] pg_reset_stats + cache I/O % 





Thanks Tom, sorry I neglected to copy the list on my previous email.


Does this query make sense and is it valid for an accurate cache % hit ratio for the entire DB? I would assume I could use the same logic with other views such as pg_stat_user_tables to get a per table ratio?

SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric + blks_read::numeric)) * 100,2)
AS Cache % Hit
FROM pg_stat_database
WHERE datname = 'Fix1';


RETURNS


Cache % Hit

 98.06
(1 row)


Thank you,
Tim


-Original Message-
From:  Tom Lane [mailto:[EMAIL PROTECTED]] 
Sent: Tuesday, March 07, 2006 2:37 PM
To: mcelroy, tim
Cc: 'pgsql-performance@postgresql.org'
Subject: Re: [PERFORM] pg_reset_stats + cache I/O % 


mcelroy, tim [EMAIL PROTECTED] writes:
 ERROR: function round(double precision, integer) does not exist


Try coercing to numeric instead of float. Also, it'd be a good idea to
put that coercion outside the sum()'s instead of inside --- summing
bigints is probably noticeably faster than summing numerics.


   regards, tom lane





[PERFORM] Is good idea an array of 365 elements in a cell of a table, in order to perform searchs?

2006-03-08 Thread Ruben Rubio Rey

Hi,

Im having a dude with a new inplementation in a web site.
The ojective is create a search as fast as possible. I have thought two 
possibilities to do that:


I have several items. Those items has 1 or more of capacity. Each 
capacity, has several dates (From 1 january to 10 of april, for 
example). The dates covers 366 days, the current year, and they are 
indeterminated ranges. Per each date, it has price per day, per week, 
per15days and per month.


I have designed two possibilities:

First:   
IdItem   StartDateEndDateCapacity   PricePerDay   PricePerWeek* 
 PricePer15days*PricePerMonth*
   1 1-1-2005 10-1-2005 2100   
90 85   80
   1   11-1-2005 20-1-2005 2105   
94 83   82
   1   21-1-2005   5-2-2005 4405   
  394   283 182

   2   ...
Right now arround 30.000 rows, in one year is spected to have 60.000 rows

* In order to compare right, all prices will be translated to days. 
Example, PricePerWeek will have the Week Price / 7 and go on


Second
IdItem   Capacity   Days   
   Week   15Days   Month   Year
  1  2  [Array of  365 values,  one per day of 
year] [ .Array. ]   [ .Array. ]   [ .Array. ]   [ .Array. ]

  ^__ Each item of array its a price

Right now arround 2.500 rows. in one year is spected to have 5.000 rows

I have to compare prices or prices and dates or prices and dates and 
capacity or capacity and prices


I have no experience working with arrays on a table. Is it fast?
Witch one do u think will have better performance?
Any good idea?

I hope this is enouth information.
Thanks in advance,
Ruben Rubio Rey

---(end of broadcast)---
TIP 6: explain analyze is your friend


Re: [PERFORM] Bad row estimates

2006-03-08 Thread Alex Adriaanse
Thank you all for your valuable input.  I have tried creating a partial 
index, a GIST index, and a GIST + partial index, as suggested, but it 
does not seem to make a significant difference.  For instance:


CREATE INDEX test_table_1_interval_idx ON test_table_1 USING GIST
   (box(point(start_ts::abstime::integer, start_ts::abstime::integer), 
point(end_ts::abstime::integer, end_ts::abstime::integer)))
   WHERE id = g_id;

ANALYZE test_table_1;

EXPLAIN ANALYZE SELECT count(*) FROM test_table_1
   INNER JOIN test_table_2 ON (test_table_2.s_id=13300613 AND test_table_1.id = 
test_table_2.n_id)
   WHERE box(point(start_ts::abstime::integer, start_ts::abstime::integer), 
point(end_ts::abstime::integer, end_ts::abstime::integer))
   ~ 
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
   AND test_table_1.id = test_table_1.g_id;
   QUERY PLAN  
---

Aggregate  (cost=15.09..15.10 rows=1 width=0) (actual time=69.771..69.772 
rows=1 loops=1)
  -  Nested Loop  (cost=9.06..15.08 rows=1 width=0) (actual 
time=69.752..69.752 rows=0 loops=1)
-  Index Scan using test_table_1_interval_idx on test_table_1  
(cost=0.07..4.07 rows=1 width=22) (actual time=2.930..3.607 rows=135 loops=1)
  Index Cond: (box(pointstart_ts)::abstime)::integer)::double 
precision, (((start_ts)::abstime)::integer)::double precision), 
pointend_ts)::abstime)::integer)::double precision, 
(((end_ts)::abstime)::integer)::double precision)) ~ 
box(pointnow())::abstime)::integer)::double precision, 
(((now())::abstime)::integer)::double precision), 
pointnow())::abstime)::integer)::double precision, 
(((now())::abstime)::integer)::double precision)))
-  Bitmap Heap Scan on test_table_2  (cost=8.99..11.00 rows=1 
width=12) (actual time=0.486..0.486 rows=0 loops=135)
  Recheck Cond: ((test_table_2.s_id = 13300613::numeric) AND 
(outer.id = test_table_2.n_id))
  -  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual 
time=0.485..0.485 rows=0 loops=135)
-  Bitmap Index Scan on test_table_2_s_id  
(cost=0.00..2.17 rows=48 width=0) (actual time=0.015..0.015 rows=1 loops=135)
  Index Cond: (s_id = 13300613::numeric)
-  Bitmap Index Scan on test_table_2_n_id  
(cost=0.00..6.57 rows=735 width=0) (actual time=0.467..0.467 rows=815 loops=135)
  Index Cond: (outer.id = test_table_2.n_id)
Total runtime: 69.961 ms

(Note: without the GIST index the query currently runs in about 65ms)

Its row estimates are still way off.  As a matter of fact, it almost 
seems as if the index doesn't affect row estimates at all.


What would you guys suggest?

Thanks,

Alex

Greg Stark wrote:

You could actually take short cuts using expression indexes to do this. If it
works out well then you might want to implement a real data type to avoid the
overhead of the SQL conversion functions.

Here's an example. If I were to do this for real I would look for a better
datatype than the box datatype and I would wrap the whole conversion in an SQL
function. But this will serve to demonstrate:

stark= create table interval_test (start_ts timestamp with time zone, end_ts 
timestamp with time zone);
CREATE TABLE

stark= create index interval_idx on interval_test using gist 
(box(point(start_ts::abstime::integer, end_ts::abstime::integer) , 
point(start_ts::abstime::integer, end_ts::abstime::integer)));
CREATE INDEX

stark= explain select * from interval_test where 
box(point(now()::abstime::integer,now()::abstime::integer),point(now()::abstime::integer,now()::abstime::integer))
 ~ box(point(start_ts::abstime::integer, end_ts::abstime::integer) , 
point(start_ts::abstime::integer, end_ts::abstime::integer));
  QUERY PLAN   

Re: [PERFORM] Bad row estimates

2006-03-08 Thread Greg Stark
Alex Adriaanse [EMAIL PROTECTED] writes:

 Its row estimates are still way off.  As a matter of fact, it almost seems as
 if the index doesn't affect row estimates at all.

Indexes normally don't affect estimates. Expression indexes do effectively
create a new column to generate stats for, but that doesn't really help here
because there aren't any estimation functions for the geometric gist indexes.

 -  BitmapAnd  (cost=8.99..8.99 rows=1 width=0) (actual time=0.485..0.485 
 rows=0 loops=135)
  -  Bitmap Index Scan on test_table_2_s_id  (cost=0.00..2.17 rows=48 
 width=0) (actual time=0.015..0.015 rows=1 loops=135)
Index Cond: (s_id = 13300613::numeric)
  -  Bitmap Index Scan on test_table_2_n_id  (cost=0.00..6.57 rows=735 
 width=0) (actual time=0.467..0.467 rows=815 loops=135)
Index Cond: (outer.id = test_table_2.n_id)

If this query is representative then it seems you might be better off without
the test_table_2_n_id index. Of course this could be a problem if you need
that index for other purposes.

I'm puzzled how test_table_2_s_id's estimate isn't more precise. Are there
some values of s_id that are quite common and others that are unique? You
might try raising the statistics target on s_id.

Incidentally, 70ms is pretty good. I'm usually happy if all my mundane queries
are under 100ms and the more complex queries in the vicinity of 300ms. Trying
to optimize below 100ms is hard because you'll find a lot of variability in
the performance. Any extraneous disk i/o from checkpoints, vacuums, even other
services, will throw off your expectations.

-- 
greg


---(end of broadcast)---
TIP 9: In versions below 8.0, the planner will ignore your desire to
   choose an index scan if your joining column's datatypes do not
   match


Re: [PERFORM] pg_reset_stats + cache I/O %

2006-03-08 Thread Jim C. Nasby
Out of curiosity, why do you want this info? More important, do the
folks who are looking at this understand that a key part of PostgreSQL's
tuning strategy is to let the OS handle the bulk of the caching?

On Wed, Mar 08, 2006 at 08:59:51AM -0500, mcelroy, tim wrote:
 Thanks Tom, sorry I neglected to copy the list on my previous email.
 
 Does this query make sense and is it valid for an accurate cache % hit ratio
 for the entire DB?  I would assume I could use the same logic with other
 views such as pg_stat_user_tables to get a per table ratio?
 
 SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric +
 blks_read::numeric)) * 100,2)
 AS Cache % Hit
 FROM pg_stat_database
 WHERE datname = 'Fix1';
 
 RETURNS
 
 Cache % Hit
 
98.06
 (1 row)
 
 Thank you,
 Tim
 
  -Original Message-
 From: Tom Lane [mailto:[EMAIL PROTECTED] 
 Sent: Tuesday, March 07, 2006 2:37 PM
 To:   mcelroy, tim
 Cc:   'pgsql-performance@postgresql.org'
 Subject:  Re: [PERFORM] pg_reset_stats + cache I/O % 
 
 mcelroy, tim [EMAIL PROTECTED] writes:
  ERROR:  function round(double precision, integer) does not exist
 
 Try coercing to numeric instead of float.  Also, it'd be a good idea to
 put that coercion outside the sum()'s instead of inside --- summing
 bigints is probably noticeably faster than summing numerics.
 
   regards, tom lane

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 3: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faq


Re: [PERFORM] pg_reset_stats + cache I/O %

2006-03-08 Thread mcelroy, tim
Title: RE: [PERFORM] pg_reset_stats + cache I/O %





I actually need this info as I was tasked by management to provide it. Not sure if they understand that or not, I do but management does like to see how well the system and its components are performing. Also, I would utilize these results to test any cache tuning changes I may make. 

Tim


-Original Message-
From:  Jim C. Nasby [mailto:[EMAIL PROTECTED]] 
Sent: Wednesday, March 08, 2006 1:28 PM
To: mcelroy, tim
Cc: 'Tom Lane'; 'pgsql-performance@postgresql.org'
Subject: Re: [PERFORM] pg_reset_stats + cache I/O %


Out of curiosity, why do you want this info? More important, do the
folks who are looking at this understand that a key part of PostgreSQL's
tuning strategy is to let the OS handle the bulk of the caching?


On Wed, Mar 08, 2006 at 08:59:51AM -0500, mcelroy, tim wrote:
 Thanks Tom, sorry I neglected to copy the list on my previous email.
 
 Does this query make sense and is it valid for an accurate cache % hit ratio
 for the entire DB? I would assume I could use the same logic with other
 views such as pg_stat_user_tables to get a per table ratio?
 
 SELECT 100 - round((blks_hit::numeric / (blks_hit::numeric +
 blks_read::numeric)) * 100,2)
 AS Cache % Hit
 FROM pg_stat_database
 WHERE datname = 'Fix1';
 
 RETURNS
 
 Cache % Hit
 
 98.06
 (1 row)
 
 Thank you,
 Tim
 
 -Original Message-
 From:  Tom Lane [mailto:[EMAIL PROTECTED]] 
 Sent: Tuesday, March 07, 2006 2:37 PM
 To: mcelroy, tim
 Cc: 'pgsql-performance@postgresql.org'
 Subject: Re: [PERFORM] pg_reset_stats + cache I/O % 
 
 mcelroy, tim [EMAIL PROTECTED] writes:
  ERROR: function round(double precision, integer) does not exist
 
 Try coercing to numeric instead of float. Also, it'd be a good idea to
 put that coercion outside the sum()'s instead of inside --- summing
 bigints is probably noticeably faster than summing numerics.
 
regards, tom lane


-- 
Jim C. Nasby, Sr. Engineering Consultant [EMAIL PROTECTED]
Pervasive Software http://pervasive.com work: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf cell: 512-569-9461





Re: [PERFORM] Is good idea an array of 365 elements in a cell of a table, in order to perform searchs?

2006-03-08 Thread Jim C. Nasby
If you need to compare stuff on a day-by-day basis, I think you'll be
much better off just expanding stuff into a table of:

item_id int NOT NULL
, day   date NOT NULL
, capacitiy ...
, price_per_day ...
, price_per_week ...
, PRIMARY KEY( item_id, day )

(Note that camel case and databases don't mix well...)

Sure, you're de-normalizing here, but the key is that you're putting the
data into a format where you can easily do things like:

SELECT sum(capacity) FROM ... WHERE day = '2006-12-18';

Trying to do that with arrays would be noticably more complex. And if
you wanted to do a whole month or something? Yeck...

BTW, another option is to roll price_per_15_days and price_per_month
into a different table, since you'd only need 24 rows per item. Might be
worth the trade-off in complexity depending on the specifics of the
application.

On Wed, Mar 08, 2006 at 03:28:36PM +0100, Ruben Rubio Rey wrote:
 Hi,
 
 Im having a dude with a new inplementation in a web site.
 The ojective is create a search as fast as possible. I have thought two 
 possibilities to do that:
 
 I have several items. Those items has 1 or more of capacity. Each 
 capacity, has several dates (From 1 january to 10 of april, for 
 example). The dates covers 366 days, the current year, and they are 
 indeterminated ranges. Per each date, it has price per day, per week, 
 per15days and per month.
 
 I have designed two possibilities:
 
 First:   
 IdItem   StartDateEndDateCapacity   PricePerDay   PricePerWeek* 
  PricePer15days*PricePerMonth*
1 1-1-2005 10-1-2005 2100   
 90 85   80
1   11-1-2005 20-1-2005 2105   
 94 83   82
1   21-1-2005   5-2-2005 4405   
   394   283 182
2   ...
 Right now arround 30.000 rows, in one year is spected to have 60.000 rows
 
 * In order to compare right, all prices will be translated to days. 
 Example, PricePerWeek will have the Week Price / 7 and go on
 
 Second
 IdItem   Capacity   Days   
Week   15Days   Month   Year
   1  2  [Array of  365 values,  one per day of 
 year] [ .Array. ]   [ .Array. ]   [ .Array. ]   [ .Array. ]
   ^__ Each item of array its a price
 
 Right now arround 2.500 rows. in one year is spected to have 5.000 rows
 
 I have to compare prices or prices and dates or prices and dates and 
 capacity or capacity and prices
 
 I have no experience working with arrays on a table. Is it fast?
 Witch one do u think will have better performance?
 Any good idea?
 
 I hope this is enouth information.
 Thanks in advance,
 Ruben Rubio Rey
 
 ---(end of broadcast)---
 TIP 6: explain analyze is your friend
 

-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 1: if posting/reading through Usenet, please send an appropriate
   subscribe-nomail command to [EMAIL PROTECTED] so that your
   message can get through to the mailing list cleanly


Re: [PERFORM] Postgres and Ingres R3 / SAN

2006-03-08 Thread Jim Nasby
Adding -performance back; you should do a reply-all if you want to reply to 
list messages.

 From: Jeremy Haile [mailto:[EMAIL PROTECTED]
  Can you point us at more info about this? I can't even find 
 a website
  for Ingres...
 
 Ingres is based off of the same original codebase that PostgreSQL was
 based upon (a long time ago)  It is owned by Computer 
 Associates and was
 open sourced last year.  It supports clustering and replication, and
 I've seen an Ingres install set up as a cluster backed by a 
 SAN before. 
 I just haven't talked to anyone (at least unbiased) who has used this
 type of setup in production, and I'm not fully aware of the
 advantages/disadvantages of this type of setup with Ingres.  
 Since this
 group seems pretty knowledgable about performance advantages 
 (and we are
 currently running PostgreSQL), I wanted to see if there were any
 experiences or opinions.
 
 Here is a link to their website:
 http://opensource.ca.com/projects/ingres
 
 
  Perhaps if you posted your performance requirements someone 
 could help
  point you to a solution that would meet them.
 
 This is honestly more of a curiousity question at the moment, 
 so I don't
 have any specific numbers.  We definitely have a requirement for
 failover in the case of a machine failure, so we at least need
 Master-Slave replication.  However, I wanted to solicit 
 information on
 clustering alternatives as well, since scalability will likely be a
 future problem for our database. 

Ahh, ok... that's likely a much different requirement than true clustering.

What a lot of folks do right now is segregate their application into a 
read-only stream and the more interactive read-write streams, and then use 
Slony to replicate data to a number of machines for the read-only work. This 
way anyone who's hitting the site read-only (and can handle some possible 
delay) will just hit one of the slave machines. People who are doing 
interactive work (updating data) will hit the master. Since most applications 
do far more reading than they do writing, this is a pretty good way to 
load-balance.
--
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 6: explain analyze is your friend


[PERFORM] embedded postgres and threading

2006-03-08 Thread Gorshkov
Sorry if this is the wrong list ...

I'm in the process of developing an application based on gtk  postgress for 
both windows  linux.

Short, simple and to the point - I'm using embedded SQL  is there anything 
I should know about using postgress in multiple threads, under linux OR 
windows? I've not been able to find anything in the FAQ or documentation 
regarding this

---(end of broadcast)---
TIP 5: don't forget to increase your free space map settings


Re: [PERFORM] pg_reset_stats + cache I/O %

2006-03-08 Thread Jim C. Nasby
On Wed, Mar 08, 2006 at 01:35:35PM -0500, mcelroy, tim wrote:
 I actually need this info as I was tasked by management to provide it.  Not
 sure if they understand that or not, I do but management does like to see
 how well the system and its components are performing.  Also, I would
 utilize these results to test any cache tuning changes I may make.  

What I feared. While monitoring cache hit % over time isn't a bad idea,
it's less than half the picture, which makes fertile ground for
optimizing for some mythical target instead of actual system
performance. If the conclusion from these numbers is that
shared_buffers needs to get set larger than min(5, 10% of memory)
I'd very seriously re-consider how performance tuning is being done.

But hopefully I'm just being paranoid and you guys are just doing a
great job of monitoring things and keeping on the ball. :)
-- 
Jim C. Nasby, Sr. Engineering Consultant  [EMAIL PROTECTED]
Pervasive Software  http://pervasive.comwork: 512-231-6117
vcard: http://jim.nasby.net/pervasive.vcf   cell: 512-569-9461

---(end of broadcast)---
TIP 2: Don't 'kill -9' the postmaster