Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Patrick B Kelly
On Nov 21, 2004, at 11:55 PM, Sean Chittenden wrote:

This is similar to sending email in a trigger or on commit where you 
can't be certain you send email always
and only on a commit.
While this is certainly a possibility, it's definitely closer to the 
exception and not the normal instance.
While an exception, this is a very real possibility in day to day 
operations. The absence of any feedback or balancing mechanism between 
the database and cache makes it impossible to know that they are in 
sync and even a small error percentage multiplied over time will lead 
to an ever increasing likelihood of error.

More dangerous is that this discrepancy will NOT always be apparent 
because without active verification of the correctness of the cache, we 
will not know about any errors unless the error grows to an obvious 
point. The errors may cause material damage long before they become 
obvious. This is a common failure pattern with caches.


Patrick B. Kelly
--
  http://patrickbkelly.org
---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
   (send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Pierre-Frdric Caillaud

While an exception, this is a very real possibility in day to day  
operations. The absence of any feedback or balancing mechanism between  
the database and cache makes it impossible to know that they are in sync  
and even a small error percentage multiplied over time will lead to an  
ever increasing likelihood of error.
	Sure, but there are applications where it does not matter, and these  
applications are othen loading the database... think about displaying  
forum posts, products list in a web store, and especially category trees,  
top N queries... for all these, it does not matter if the data is a bit  
stale. For instance, a very popular forum will be cached, which is very  
important. In this case I think it is acceptable if a new post does not  
appear instantly.

	Of course, when inserting or updating data in the database, the primary  
keys and other important data should be fetched from the database and not  
the cache, which supposes a bit of application logic (for instance, in a  
forum, the display page should query the cache, but the post message  
page should query the database directly).

	Memcache can also save the database from update-heavy tasks like user  
session management. In that case sessions can be stored entirely in memory.

ON COMMIT triggers would be very useful.
More dangerous is that this discrepancy will NOT always be apparent  
because without active verification of the correctness of the cache, we  
will not know about any errors unless the error grows to an obvious  
point.

The errors may cause material damage long before they become obvious.  
This is a common failure pattern with caches.
	This is why it would be dangerous to fetch referential integrity data  
from the cache... this fits your banking example for instance.

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Index usage for sorted query

2004-11-22 Thread Markus Schaber
Hi, Pierre-Frédéric,

On Sat, 20 Nov 2004 17:12:43 +0100
Pierre-Frédéric Caillaud [EMAIL PROTECTED] wrote:

  WHERE cd='ca' ORDER BY l_postcode;
 
 Write :
 
  WHERE cd='ca' ORDER BY cd, l_postcode;
 
 You have a multicolumn index, so you should specify a multicolumn sort  
 exactly the same as your index, and the planner will get it.

Thanks, that seems to help.

Seems weird to order by a column that is all the same value, but well,
why not :-)

Thanks a lot,
Markus


-- 
markus schaber | dipl. informatiker
logi-track ag | rennweg 14-16 | ch 8001 zürich
phone +41-43-888 62 52 | fax +41-43-888 62 53
mailto:[EMAIL PROTECTED] | www.logi-track.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread sarlav kumar
Hi Tom,

Thanks for the help, Tom.

The major issue seems to be in the sub-selects: - Seq Scan on merchant_purchase mp (cost=0.00..95.39 rows=44 width=4) (actual time=2.37..2.58 rows=6 loops=619) Filter: (merchant_id = $0)where the estimated row count is a factor of 7 too high. If theestimated row count were even a little lower, it'd probably have gonefor an indexscan.

I understand that the sub-selects are taking up most of the time as they do a sequential scan on the tables. 

You might get some results from increasing thestatistics target for merchant_purchase.merchant_id. 

Do I have to use vacuum analyze to update the statistics? If so, I have already tried that and it doesn't seem to help.

If that doesn't help, I'd think about reducing random_page_cost a little bit.

I am sorry, I am not aware of what random_page_cost is, as I am new to Postgres. What does it signify and how do I reduce random_page_cost? Thanks,
Saranya__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread gnari
From: sarlav kumar [EMAIL PROTECTED]

  [Tom:]
  You might get some results from increasing the
 statistics target for merchant_purchase.merchant_id.

 Do I have to use vacuum analyze to update the statistics? If so, I have
already tried that and it doesn't seem to help.

alter table merchant_purchase alter column merchant_id set statistics 500;
analyze merchant_purchase;


 If that doesn't help, I'd think about reducing random_page_cost a little
bit.

 I am sorry, I am not aware of what random_page_cost is, as I am new to
Postgres. What does it signify and how do I reduce random_page_cost?

set random_page_cost = 3;
explain analyse query

if it is an improvement, consider setting the value in your postgresql.conf,
but remember that this may affect other queries too.

gnari




---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread Josh Berkus
Sarlav,

 I am sorry, I am not aware of what random_page_cost is, as I am new to
 Postgres. What does it signify and how do I reduce random_page_cost?

It's a parameter in your postgresql.conf file.After you test it, you will 
want to change it there and reload the server (pg_ctl reload).

However, you can test it on an individual connection:
SET random_page_cost=2.5
(the default is 4.0)

-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 5: Have you checked our extensive FAQ?

   http://www.postgresql.org/docs/faqs/FAQ.html


Re: [PERFORM] help needed -- sequential scan problem

2004-11-22 Thread sarlav kumar
Hi Josh,

Can you tell me in what way it affects performance? And How do I decide what value to set for the random_page_cost? Does it depend on any other factors?

Thanks,
SaranyaJosh Berkus [EMAIL PROTECTED] wrote:
Sarlav, I am sorry, I am not aware of what random_page_cost is, as I am new to Postgres. What does it signify and how do I reduce random_page_cost?It's a parameter in your postgresql.conf file. After you test it, you will want to change it there and reload the server (pg_ctl reload).However, you can test it on an individual connection:SET random_page_cost=2.5(the default is 4.0)-- --JoshJosh BerkusAglio Database SolutionsSan Francisco__Do You Yahoo!?Tired of spam?  Yahoo! Mail has the best spam protection around http://mail.yahoo.com 

[PERFORM] Data type to use for primary key

2004-11-22 Thread Alexandre Leclerc
Good day,

I'm asking myself if there is a performance issue in using an integer
of varchar(24) PRIMARY KEY in a product table.

I've read that there is no speed issue in the query, but that the only
performance issue is the database size of copying the key in other
tables that require it.

My product_id is copied in orders, jobs, and other specific tables.

What is the common approach? Should I use directly the product_code as
my ID, or use a sequantial number for speed? (I did the same for the
company_id, this is a 'serial' and not the shor name of the customer.
I just don't know what is usually done.

Right now I did the following:
CREATE TABLE design.products (
product_id  serial  PRIMARY KEY,
company_id  integer NOT NULL REFERENCES sales.companies ON
UPDATE CASCADE,
product_codevarchar(24) NOT NULL,
...
CONSTRAINT product_code_already_used_for_this_company UNIQUE
(company_id, product_code)
);

CREATE TABLE sales.companies (
company_idinteger  PRIMARY KEY,
company_name  varchar(48)  NOT NULL UNIQUE,
...
);

The company_id is also copied in many tables like product, contacts, etc.

Thank you very much for any good pointers on this 'already seen' issue.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


[PERFORM] Slow execution time when querying view with WHERE clause

2004-11-22 Thread Mike Mascari
I have the following view:
create or replace view market.p_areas as
select a.*
from _areas a
where a.area in (
 select b.area
 from _bins b, _inventories i, _offers o, _pricemembers p
 where b.bin = i.bin and
 i.inventory = o.inventory and
 o.pricegroup = p.pricegroup and
 p.buyer in (
  select s.store
  from _stores s, _webusers w
  where w.webuser = getWebuser() and
  w.company = s.company
   union
  select s.store
  from _stores s, _companies c
  where s.company = c.company and
  c.companyid = 'DEFAULT'
 )
);
When I query the view without a where clause I get:
 explain analyze select * from p_areas;

 QUERY PLAN
-
 Hash Join  (cost=1273.12..1276.31 rows=47 width=163) (actual 
time=438.739..439.574 rows=34 loops=1)
   Hash Cond: (outer.area = inner.area)
   -  Seq Scan on _areas a  (cost=0.00..2.48 rows=48 width=163) (actual 
time=0.015..0.169 rows=48 loops=1)
   -  Hash  (cost=1273.01..1273.01 rows=47 width=8) (actual 
time=438.532..438.532 rows=0 loops=1)
 -  HashAggregate  (cost=1273.01..1273.01 rows=47 width=8) (actual 
time=438.286..438.395 rows=34 loops=1)
   -  Hash Join  (cost=558.53..1266.68 rows=2532 width=8) (actual 
time=160.923..416.968 rows=5264 loops=1)
 Hash Cond: (outer.bin = inner.bin)
 -  Hash Join  (cost=544.02..1207.86 rows=2531 width=8) 
(actual time=156.097..356.560 rows=5264 loops=1)
   Hash Cond: (outer.inventory = inner.inventory)
   -  Seq Scan on _inventories i  (cost=0.00..265.96 
rows=11396 width=16) (actual time=0.010..44.047 rows=11433 loops=1)
   -  Hash  (cost=537.14..537.14 rows=2751 width=8) 
(actual time=155.891..155.891 rows=0 loops=1)
 -  Hash Join  (cost=13.96..537.14 rows=2751 
width=8) (actual time=11.967..136.598 rows=5264 loops=1)
   Hash Cond: (outer.pricegroup = 
inner.pricegroup)
   -  Seq Scan on _offers o  
(cost=0.00..379.24 rows=15524 width=16) (actual time=0.008..50.335 rows=15599 
loops=1)
   -  Hash  (cost=13.94..13.94 rows=9 
width=8) (actual time=11.861..11.861 rows=0 loops=1)
 -  Hash IN Join  
(cost=8.74..13.94 rows=9 width=8) (actual time=10.801..11.801 rows=12 loops=1)
   Hash Cond: (outer.buyer = 
inner.store)
   -  Seq Scan on 
_pricemembers p  (cost=0.00..4.07 rows=207 width=16) (actual time=0.011..0.548 
rows=207 loops=1)
   -  Hash  (cost=8.72..8.72 
rows=8 width=8) (actual time=10.687..10.687 rows=0 loops=1)
 -  Subquery Scan 
IN_subquery  (cost=8.60..8.72 rows=8 width=8) (actual time=10.645..10.654 rows=1 
loops=1)
   -  Unique  
(cost=8.60..8.64 rows=8 width=8) (actual time=10.631..10.636 rows=1 loops=1)
 -  Sort  
(cost=8.60..8.62 rows=8 width=8) (actual time=10.625..10.627 rows=1 loops=1)
   Sort 
Key: store
   -  
Append  (cost=2.86..8.48 rows=8 width=8) (actual time=10.529..10.583 rows=1 
loops=1)
 -  
Subquery Scan *SELECT* 1  (cost=2.86..5.15 rows=5 width=8) (actual 
time=10.222..10.222 rows=0 loops=1)
   
-  Hash Join  (cost=2.86..5.10 rows=5 width=8) (actual time=10.214..10.214 
rows=0 loops=1)
 Hash 
Cond: (outer.company = inner.company)
   
  -  Seq Scan on _stores s  (cost=0.00..2.13 rows=13 width=16) (actual 
time=0.019..0.059 rows=13 loops=1)
   
  -  Hash  (cost=2.85..2.85 rows=1 width=8) (actual 
time=10.031..10.031 rows=0 loops=1)
   
-  Seq Scan on _webusers w  (cost=0.00..2.85 rows=1 width=8) 
(actual time=10.023..10.023 rows=0 loops=1)

   

Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-22 Thread Mike Mascari
Mike Mascari wrote:
I have the following view:
create or replace view market.p_areas as
select a.*
from _areas a
where a.area in (
 select b.area
 from _bins b, _inventories i, _offers o, _pricemembers p
 where b.bin = i.bin and
 i.inventory = o.inventory and
 o.pricegroup = p.pricegroup and
 p.buyer in (
  select s.store
  from _stores s, _webusers w
  where w.webuser = getWebuser() and
  w.company = s.company
   union
  select s.store
  from _stores s, _companies c
  where s.company = c.company and
  c.companyid = 'DEFAULT'
 )
);
...
I failed to report the version:
select version();
PostgreSQL 7.4.5 on i686-pc-linux-gnu, compiled by GCC 
i686-pc-linux-gnu-gcc (GCC) 3.4.0 20040204 (prerelease)

Sorry.
Mike Mascari
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


[PERFORM] scalability issues on win32

2004-11-22 Thread Merlin Moncure
Following is the promised writeup in performance related issues
comparing win32 with linux x86 and linux x86-64.  Unfortunately, the 64
bit portion of the test is not yet completed and won't be for a bit.
However there are some telling things about the win32/linux comparison.
If you are considering deploying postgres in a win32 environment read
on...
 
First a description of the client app:
Our company develops an ERP/CRM written in cobol which we are porting to
run on PostgreSQL.  Part of this porting effort was development of an
ISAM 'driver' for our app to allow it to store/fetch data from the
database in place of a traditional file system, which is complete.

For those of you unfamiliar with COBOL/ISAM, applications written with
it have a 'one record at a time' mentality, such the application tends
to spam the server with queries of the select * from t where k = k1
variety.  Our driver creates stored procedures as necessary and uses
ExecParams wherever possible to cut down on server CPU time, which is a
precious resource in our case.  Over time we plan to gradually redeploy
our application logic to make better use of the sql server's server side
power.  Our application is very rarely i/o bound because we will make
sure our server has enough memory so that the data will be rarely, if
ever, *not* run from the cache.

A good benchmark of our application performance is the time it takes to
read the entire bill of materials for a product.  This is a recursive
read of about 2500 records in the typical case (2408 in the test case).

Test platform:
Pentium 4 3.06 GHz/HT
10k SATA Raptor
1Gb memory
Windows XP Pro SP2/Redhat Fedora 3 (64 bit results coming soon)

BOM traversal for product * (1 user): 
win32: runtime: 3.34 sec  avg cpu load: 60%
redhat: runtime: 3.46 sec  avg cpu load: 20%

Well, win32 wins this test.  There is some variability in the results
meaning for a single user scenario there is basically no difference
between win32 and linux in execution time.  However the cpu load is much
lower for linux which spells problems for win32 with multiple users:

BOM traversal for product * (6 users):
win32: runtime (each): 7.29 sec  avg cpu load: 100%
redhat: runtime (each): 4.56 sec  avg cpu load: 90%

Here, the win32 problems with cpu load start to manifest.  The cpu meter
stays pegged at 100% while the redhat hand around 90%.  The difference
in times is telling.

The third and final test is what I call the query 'surprise' factor, IOW
surprise! your query takes forever!  The test involves a combination of
the previous test with a query with a couple of joins that returns about
15k records.  On both redhat/win32, the query takes about .35 seconds to
execute on a unloaded server...remember that figure.



Item List generation while 6 clients generating BOM for multiple
products:
Redhat: 2.5 seconds
Win32: 155 seconds (!)

Here the win32 server is showing real problems.  Also, the query
execution time is really variable, in some cases not returning until the
6 workhorse processes completed their tasks.  The linux server by
contrast ran slower but never ran over 4 seconds after multiple runs.

Also, on the purely subjective side, the linux server 'feels' faster and
considerably more responsive under load, even under much higher load.

Comments/Suggestions?

Merlin




---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Pierre-Frdric Caillaud

What is the common approach? Should I use directly the product_code as
my ID, or use a sequantial number for speed? (I did the same for the
company_id, this is a 'serial' and not the shor name of the customer.
I just don't know what is usually done.
	Use a serial :
	- you can change product_code for a product easily
	- you can pass around integers easier around, in web forms for instance,  
you don't have to ask 'should I escape this string ?'
	- it's faster
	- it uses less space
	- if one day you must manage products from another source whose  
product_code overlap yours, you won't have problems
	- you can generate them with a serial uniquely and easily

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
 joining column's datatypes do not match


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Alexandre Leclerc
Mr Caillaud,

Merci! Many points you bring were also my toughts. I was asking myself
really this was the way to go. I'm happy to see that my view of the
problem was good.

Encore merci! (Thanks again!)

On Tue, 23 Nov 2004 00:06:13 +0100, Pierre-Frédéric Caillaud
[EMAIL PROTECTED] wrote:
 
  What is the common approach? Should I use directly the product_code as
  my ID, or use a sequantial number for speed? (I did the same for the
  company_id, this is a 'serial' and not the shor name of the customer.
  I just don't know what is usually done.
 
 Use a serial :
 - you can change product_code for a product easily
 - you can pass around integers easier around, in web forms for 
 instance,
 you don't have to ask 'should I escape this string ?'
 - it's faster
 - it uses less space
 - if one day you must manage products from another source whose
 product_code overlap yours, you won't have problems
 - you can generate them with a serial uniquely and easily
 
 ---(end of broadcast)---
 TIP 9: the planner will ignore your desire to choose an index scan if your
   joining column's datatypes do not match
 


-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] scalability issues on win32

2004-11-22 Thread Pierre-Frdric Caillaud

Test platform:
Pentium 4 3.06 GHz/HT
10k SATA Raptor
1Gb memory
Windows XP Pro SP2/Redhat Fedora 3 (64 bit results coming soon)
	Could you please add information about...
	- filesystems ?
	- windows configured as network server or as desktop box ?
	- virtual memory
	In my experience you MUST deactivate virtual memory on a Windows box to  
avoid catastrophic competition between virtual memory and disk cache
	- respective pgsql configurations (buffers...) identical ?
	- explain analyze for the two, identical ?
	- client on same machine or via network (100Mb ? 1G ?)
	- size of the data set involved in query
	- first query time after boot (with nothing in the cache), and times for  
the next disk-cached runs ?
	- are the N users doing the same query or exercising different parts of  
the dataset ?

You don't do any writes in your test do you ? Just big SELECTs ?
---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Josh Berkus
Alexandre,

 What is the common approach? Should I use directly the product_code as
 my ID, or use a sequantial number for speed? (I did the same for the
 company_id, this is a 'serial' and not the shor name of the customer.
 I just don't know what is usually done.

Don't use SERIAL just because it's there.Ideally, you *want* to use the 
product_code if you can.   It's your natural key and a natural key is always 
superior to a surrogate key all other things being equal.   

Unfortunately, all other things are NOT equal.Here's the reasons why you'd 
use a surrogate key (i.e. SERIAL):

1) because the product code is a large text string  (i.e.  10bytes) and you 
will have many millions of records, so having it as an FK in other tables 
will add significantly to the footprint of the database;

2) because product codes get blanket changes frequently, where thousands of 
them pet re-mapped to new codes, and the ON CASCADE UPDATE slow performance 
will kill your database;

3) Because every other table in the database has a SERIAL key and consistency 
reduces errors;

4) or because your interface programmers get annoyed with using different 
types of keys for different tables and multicolumn keys.

If none of the above is true (and I've had it not be, in some tables and some 
databases) then you want to stick with your natural key, the product_code.


-- 
--Josh

Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Rod Taylor
On Mon, 2004-11-22 at 16:54 -0800, Josh Berkus wrote:
 Alexandre,
 
  What is the common approach? Should I use directly the product_code as
  my ID, or use a sequantial number for speed? (I did the same for the
  company_id, this is a 'serial' and not the shor name of the customer.
  I just don't know what is usually done.
 
 Don't use SERIAL just because it's there.Ideally, you *want* to use the 
 product_code if you can.   It's your natural key and a natural key is always 
 superior to a surrogate key all other things being equal.   

It would be nice if PostgreSQL had some form of transparent surrogate
keying in the background which would automatically run around and
replace your real data with SERIAL integers. It could use a lookup table
for conversions between the surrogate and real values so the user never
knows that it's done, a bit like ENUM. Then we could all use the real
values with no performance issues for 1) because it's an integer in the
background, and 2) because a cascade only touches a single tuple in the
lookup table.


-- 


---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] memcached and PostgreSQL

2004-11-22 Thread Bruce Momjian
Pierre-Frédéric Caillaud wrote:
 
  While an exception, this is a very real possibility in day to day  
  operations. The absence of any feedback or balancing mechanism between  
  the database and cache makes it impossible to know that they are in sync  
  and even a small error percentage multiplied over time will lead to an  
  ever increasing likelihood of error.
 
   Sure, but there are applications where it does not matter, and these  
 applications are othen loading the database... think about displaying  
 forum posts, products list in a web store, and especially category trees,  
 top N queries... for all these, it does not matter if the data is a bit  
 stale. For instance, a very popular forum will be cached, which is very  
 important. In this case I think it is acceptable if a new post does not  
 appear instantly.

My point was that there are two failure cases --- one where the cache is
slightly out of date compared to the db server --- these are cases where
the cache update is slightly before/after the commit.  The second is
where the cache update happens and the commit later fails, or the commit
happens and the cache update never happens.  In these cases the cache is
out of date for the amount of time you cache the data and not expire it.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32

2004-11-22 Thread Bruce Momjian

This was an intersting Win32/linux comparison. I expected Linux to scale
better, but I was surprised how poorly XP scaled.  It reinforces our
perception that Win32 is for low traffic servers.

---

Merlin Moncure wrote:
 Following is the promised writeup in performance related issues
 comparing win32 with linux x86 and linux x86-64.  Unfortunately, the 64
 bit portion of the test is not yet completed and won't be for a bit.
 However there are some telling things about the win32/linux comparison.
 If you are considering deploying postgres in a win32 environment read
 on...
  
 First a description of the client app:
 Our company develops an ERP/CRM written in cobol which we are porting to
 run on PostgreSQL.  Part of this porting effort was development of an
 ISAM 'driver' for our app to allow it to store/fetch data from the
 database in place of a traditional file system, which is complete.
 
 For those of you unfamiliar with COBOL/ISAM, applications written with
 it have a 'one record at a time' mentality, such the application tends
 to spam the server with queries of the select * from t where k = k1
 variety.  Our driver creates stored procedures as necessary and uses
 ExecParams wherever possible to cut down on server CPU time, which is a
 precious resource in our case.  Over time we plan to gradually redeploy
 our application logic to make better use of the sql server's server side
 power.  Our application is very rarely i/o bound because we will make
 sure our server has enough memory so that the data will be rarely, if
 ever, *not* run from the cache.
 
 A good benchmark of our application performance is the time it takes to
 read the entire bill of materials for a product.  This is a recursive
 read of about 2500 records in the typical case (2408 in the test case).
 
 Test platform:
 Pentium 4 3.06 GHz/HT
 10k SATA Raptor
 1Gb memory
 Windows XP Pro SP2/Redhat Fedora 3 (64 bit results coming soon)
 
 BOM traversal for product * (1 user): 
 win32: runtime: 3.34 sec  avg cpu load: 60%
 redhat: runtime: 3.46 sec  avg cpu load: 20%
 
 Well, win32 wins this test.  There is some variability in the results
 meaning for a single user scenario there is basically no difference
 between win32 and linux in execution time.  However the cpu load is much
 lower for linux which spells problems for win32 with multiple users:
 
 BOM traversal for product * (6 users):
 win32: runtime (each): 7.29 sec  avg cpu load: 100%
 redhat: runtime (each): 4.56 sec  avg cpu load: 90%
 
 Here, the win32 problems with cpu load start to manifest.  The cpu meter
 stays pegged at 100% while the redhat hand around 90%.  The difference
 in times is telling.
 
 The third and final test is what I call the query 'surprise' factor, IOW
 surprise! your query takes forever!  The test involves a combination of
 the previous test with a query with a couple of joins that returns about
 15k records.  On both redhat/win32, the query takes about .35 seconds to
 execute on a unloaded server...remember that figure.
 
 
 
 Item List generation while 6 clients generating BOM for multiple
 products:
 Redhat: 2.5 seconds
 Win32: 155 seconds (!)
 
 Here the win32 server is showing real problems.  Also, the query
 execution time is really variable, in some cases not returning until the
 6 workhorse processes completed their tasks.  The linux server by
 contrast ran slower but never ran over 4 seconds after multiple runs.
 
 Also, on the purely subjective side, the linux server 'feels' faster and
 considerably more responsive under load, even under much higher load.
 
 Comments/Suggestions?
 
 Merlin
 
 
 
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
 

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] Data type to use for primary key

2004-11-22 Thread Josh Berkus
Rod,

 It would be nice if PostgreSQL had some form of transparent surrogate
 keying in the background which would automatically run around and
 replace your real data with SERIAL integers. It could use a lookup table
 for conversions between the surrogate and real values so the user never
 knows that it's done, a bit like ENUM. Then we could all use the real
 values with no performance issues for 1) because it's an integer in the
 background, and 2) because a cascade only touches a single tuple in the
 lookup table.

Sybase does this, and it's a feature I would dearly love to emulate.  You can 
just refer to another table, without specifying the column, as an FK and it 
will create an invisible hashed key.   This is the type of functionality Codd 
was advocating -- invisible, implementation-automated surrogate keys -- in 
the mid 90's (don't have a paper citation at the moment).

So you'd just do:

create table client_contacts (
fname text not null,
lname text not null,
client foriegn key clients,
position text,
notes text
);

and the client column would create an invisible hashed key that would drag 
in the relevant row from the clients table; thus a:

select * from client_contacts

would actually show the whole record from clients as well.


-- 
Josh Berkus
Aglio Database Solutions
San Francisco

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match