Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32

2004-11-23 Thread Merlin Moncure
Reini Urban wrote:
 Merlin Moncure schrieb:
  A good benchmark of our application performance is the time it takes
to
  read the entire bill of materials for a product.  This is a
recursive
  read of about 2500 records in the typical case (2408 in the test
case).
 
 I always knew that COBOL ultimativly looses, but it's always
refreshing
 to get confirmation from time to time :)

Heh.  It's important to make the distinction between COBOL, which is
just a language, and ISAM, which is a data delivery system.  You could,
for example, pair COBOL with SQL with good results, (in fact, we plan
to).   But yes, many legacy COBOL apps were written with assumptions
about the system architecture that are no longer valid.

 Where did you get the win32 avg cpu load number from? AFAIK there's
no
 getloadavg() for windows. At least I tried hard to find one, because I
 want to add a comparable figure to cygwin core. emacs, coreutils, make
 and others would need desperately need it, not to speak of servers and
 real-time apps.

I just eyeballed it :-).  So consider the load averages anecdotal,
although they are quite stable.  However it is quite striking that with
the same application code the win32 load average was 2-3 times higher.

I also left out the dual processor results, because I did not have time
to test them on linux.  However, sadly the 2nd processor adds very
little extras horsepower to the server.  I'm hoping linux will be
better.

Merlin

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly


Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32

2004-11-23 Thread Magnus Hagander
   This was an intersting Win32/linux comparison. I expected 
 Linux to 
   scale better, but I was surprised how poorly XP scaled.  It 
   reinforces our perception that Win32 is for low traffic servers.
  
  That's a bit harsh given the lack of any further 
 investigation so far 
  isn't it? Win32 can run perfectly well with other DBMSs 
 with hundreds 
  of users.
 
 The general opinion of server users is that you need 2-4 more 
 Win32 servers to do the same work as one Unix-like server.  
 That and the difficulty of automated administration and 
 security problems is what is preventing Win32 from making 
 greater inroads into the server marketplace.
 
 Of course these are just generalizations.

Is this for Postgresql Cygwin? You surely can't mean for all server
tasks - if so, I would say that's *way* off. There is a difference, but
it's more along the line of single-digit percentage in my experience -
provided you config your machines reasonably, of course.

(In my experience, Win32 MSSQLServer often outperforms postgresql on
Linux. Granted you can tweak postgresql up to higher speeds, but MS does
most of that tweaking automatically... Talking of tweaking a lot more
specific than just raising the memory limits from the installation
default, of course)

I do agree on the automated administration though... It's a major PITA.


//Magnus

---(end of broadcast)---
TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


Re: [PERFORM] Data type to use for primary key

2004-11-23 Thread Alexandre Leclerc
On Mon, 22 Nov 2004 16:54:56 -0800, Josh Berkus [EMAIL PROTECTED] wrote:
 Alexandre,
 
  What is the common approach? Should I use directly the product_code as
  my ID, or use a sequantial number for speed? (I did the same for the
  company_id, this is a 'serial' and not the shor name of the customer.
  I just don't know what is usually done.
 
 Don't use SERIAL just because it's there.Ideally, you *want* to use the
 product_code if you can.   It's your natural key and a natural key is always
 superior to a surrogate key all other things being equal.
 
 Unfortunately, all other things are NOT equal.Here's the reasons why you'd
 use a surrogate key (i.e. SERIAL):
 
 1) because the product code is a large text string  (i.e.  10bytes) and you
 will have many millions of records, so having it as an FK in other tables
 will add significantly to the footprint of the database;

Thanks for those tips. I'll print and keep them. So in my case, the
product_code being varchar(24) is:
4 bytes + string size (so possibly up to 24) = possible 28 bytes. I
did the good thing using a serial. For my shorter keys (4 bytes + up
to 6 char) I will use the natural key.

This is interesting, because this is what I did right now.

The transparent surrogate keying proposal that is discussed bellow
in the thread is a very good idea. It would be nice to see that. It
would be easier for the DB admin and the coder; the moment this is not
slowing the system. : )

Best regards.

-- 
Alexandre Leclerc

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Data type to use for primary key

2004-11-23 Thread Pierre-Frdric Caillaud

All,
	Well, you should still escape any strings you're getting from a web  
page so
you can ensure you're not subject to a SQL insert attack, even if you're
expecting integers.
Thanks,
Peter Darley
Well, your framework should do this for you :
	integer specified in your database object class description
	%d appears in in your generated queries (or you put it in your hand  
written queries)
	= if the parameter is not an integer, an exception is thrown, then  
catched, then an error page is displayed...

Or, just casting to int should throw an exception...
	Forms should be validated, but hidden parameters in links are OK imho to  
display an error page if they are incorrect, after all, if the user edits  
the get or post parameters, well...

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32

2004-11-23 Thread Merlin Moncure
 Is this for Postgresql Cygwin? You surely can't mean for all server
 tasks - if so, I would say that's *way* off. There is a difference,
but
 it's more along the line of single-digit percentage in my experience -
 provided you config your machines reasonably, of course.
 
 (In my experience, Win32 MSSQLServer often outperforms postgresql on
 Linux. Granted you can tweak postgresql up to higher speeds, but MS
does
 most of that tweaking automatically... Talking of tweaking a lot more
 specific than just raising the memory limits from the installation
 default, of course)

I agree with Magnus.  Specifically, I suspect there is some sort of
resource contention going on that is driving up the cpu load when the
queries follow certain patterns.  This resource contention could be
happening in the win32 port code (likely ipc), the mingw api, or inside
the o/s itself.

Other servers, namely apache, sql server and a host of others do not
have this problem.

Merlin

---(end of broadcast)---
TIP 8: explain analyze is your friend


Re: [PERFORM] Some quick Opteron 32-bit/64-bit results

2004-11-23 Thread Cott Lang
I ran quite a few file system benchmarks in RHAS x86-64 and FC2 x86-64
on a Sun V40z - I did see very consistent 50% improvements in bonnie++
moving from RHAS to FC2 with ext2/ext3 on SAN.



On Sun, 2004-11-14 at 23:51 -0800, William Yu wrote:
 Greg Stark wrote:
  William Yu [EMAIL PROTECTED] writes:
  
  
 Biggest speedup I've found yet is the backup process (PG_DUMP -- GZIP). 
 100%
 faster in 64-bit mode. This drastic speed might be more the result of 64-bit
 GZIP though as I've seen benchmarks in the past showing 
 encryption/compression
 running 2 or 3 times faster in 64-bit mode versus 32-bit.
  
  
  Isn't this a major kernel bump too? So a different scheduler, different IO
  scheduler, etc?
  
 
 I'm sure there's some speedup due to the kernel bump. I really didn't 
 have the patience to even burn the FC2 32-bit CDs much less install both 
 32-bit  64-bit FC2 in order to have a more accurate baseline comparison.
 
 However, that being said -- when you see huge speed increases like 50% 
 100% for dump+gzip, it's doubtful the kernel/process scheduler/IO 
 scheduler could have made that drastic of a difference. Maybe somebody 
 else who has done a 2.4 - 2.6 upgrade can give us a baseline to 
 subtract from my numbers.
 
 ---(end of broadcast)---
 TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]


---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Tom Lane
Mike Mascari [EMAIL PROTECTED] writes:
 When I query the view with a simple filter, I get:

 explain analyze select * from p_areas where deactive is null;

The problem seems to be here:

 -  Seq Scan on _areas a  (cost=0.00..2.48 rows=1 width=163) (actual 
 time=0.037..0.804 rows=48 loops=1)
   Filter: (deactive IS NULL)

Why is it so completely off about the selectivity of the IS NULL clause?
Are you sure you ANALYZEd this table recently?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] memcached and PostgreSQL

2004-11-23 Thread Sean Chittenden
My point was that there are two failure cases --- one where the cache 
is
slightly out of date compared to the db server --- these are cases 
where
the cache update is slightly before/after the commit.
I was thinking about this and ways to minimize this even further.  Have 
memcache clients add data and have a policy to have the database only 
delete data.  This sets the database up as the bottleneck again, but 
then you have a degree of transactionality that couldn't be previously 
achieved with the database issuing replace commands.  For example:

1) client checks the cache for data and gets a cache lookup failure
2) client beings transaction
3) client SELECTs data from the database
4) client adds the key to the cache
5) client commits transaction
This assumes that the client won't rollback or have a transaction 
failure.  Again, in 50M transactions, I doubt one of them would fail 
(sure, it's possible, but that's a symptom of bigger problems: 
memcached isn't an RDBMS).

The update case being:
1) client begins transaction
2) client updates data
3) database deletes record from memcache
4) client commits transaction
5) client adds data to memcache
The second is
where the cache update happens and the commit later fails, or the 
commit
happens and the cache update never happens.
Having pgmemcache delete, not replace data addresses this second issue. 
 -sc

--
Sean Chittenden
---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
 subscribe-nomail command to [EMAIL PROTECTED] so that your
 message can get through to the mailing list cleanly


Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Mike Mascari
Tom Lane wrote:
Mike Mascari [EMAIL PROTECTED] writes:
When I query the view with a simple filter, I get:

explain analyze select * from p_areas where deactive is null;

The problem seems to be here:

   -  Seq Scan on _areas a  (cost=0.00..2.48 rows=1 width=163) (actual 
time=0.037..0.804 rows=48 loops=1)
 Filter: (deactive IS NULL)

Why is it so completely off about the selectivity of the IS NULL clause?
Are you sure you ANALYZEd this table recently?

Yes. I just did:
[EMAIL PROTECTED] vacuum full analyze;
VACUUM
[EMAIL PROTECTED] explain analyze select * from p_areas where deactive is null;

  QUERY PLAN
--
 Nested Loop IN Join  (cost=8.62..512.47 rows=1 width=162) (actual 
time=1143.969..21811.417 rows=37 loops=1)
   Join Filter: (outer.area = inner.area)
   -  Seq Scan on _areas a  (cost=0.00..2.49 rows=1 width=162) (actual 
time=0.037..1.673 rows=49 loops=1)
 Filter: (deactive IS NULL)
   -  Nested Loop  (cost=8.62..25740.20 rows=2681 width=8) (actual 
time=1.172..429.501 rows=3566 loops=49)
 -  Nested Loop  (cost=8.62..16674.93 rows=2680 width=8) (actual 
time=1.125..281.570 rows=3566 loops=49)
   -  Merge Join  (cost=8.62..3012.72 rows=2778 width=8) (actual 
time=0.876..128.908 rows=3566 loops=49)
 Merge Cond: (outer.pricegroup = inner.pricegroup)
 -  Nested Loop IN Join  (cost=8.62..1929.41 rows=9 
width=8) (actual time=0.613..5.504 rows=9 loops=49)
   Join Filter: (outer.buyer = inner.store)
   -  Index Scan using i_pricemembers3 on 
_pricemembers p  (cost=0.00..11.13 rows=217 width=16) (actual time=0.403..1.476 
rows=142 loops=49)
   -  Subquery Scan IN_subquery  (cost=8.62..8.74 
rows=8 width=8) (actual time=0.013..0.019 rows=1 loops=6950)
 -  Unique  (cost=8.62..8.66 rows=8 width=8) 
(actual time=0.007..0.010 rows=1 loops=6950)
   -  Sort  (cost=8.62..8.64 rows=8 
width=8) (actual time=0.003..0.004 rows=1 loops=6950)
 Sort Key: store
 -  Append  (cost=2.87..8.50 
rows=8 width=8) (actual time=8.394..8.446 rows=1 loops=1)
   -  Subquery Scan *SELECT* 
1  (cost=2.87..5.17 rows=5 width=8) (actual time=8.112..8.112 rows=0 loops=1)
 -  Hash Join  
(cost=2.87..5.12 rows=5 width=8) (actual time=8.106..8.106 rows=0 loops=1)
   Hash Cond: 
(outer.company = inner.company)
   -  Seq Scan on 
_stores s  (cost=0.00..2.13 rows=13 width=16) (actual time=0.014..0.052 rows=13 
loops=1)
   -  Hash  
(cost=2.87..2.87 rows=1 width=8) (actual time=7.878..7.878 rows=0 loops=1)
 -  Seq 
Scan on _webusers w  (cost=0.00..2.87 rows=1 width=8) (actual time=7.868..7.868 
rows=0 loops=1)
   
Filter: (webuser = getwebuser())
   -  Subquery Scan *SELECT* 
2  (cost=1.08..3.33 rows=3 width=8) (actual time=0.273..0.322 rows=1 loops=1)
 -  Hash Join  
(cost=1.08..3.30 rows=3 width=8) (actual time=0.263..0.308 rows=1 loops=1)
   Hash Cond: 
(outer.company = inner.company)
   -  Seq Scan on 
_stores s  (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.042 rows=13 
loops=1)
   -  Hash  
(cost=1.07..1.07 rows=1 width=8) (actual time=0.093..0.093 rows=0 loops=1)
 -  Seq 
Scan on _companies c  (cost=0.00..1.07 rows=1 width=8) (actual time=0.061..0.081 
rows=1 loops=1)
   
Filter: ((companyid)::text = 'DEFAULT'::text)
 -  Index Scan using i_offers4 on _offers o  
(cost=0.00..1014.76 rows=16298 width=16) (actual time=0.244..72.742 rows=10433 
loops=49)
   -  Index Scan using i_inventories1 on _inventories i  
(cost=0.00..4.91 rows=1 width=16) (actual time=0.025..0.029 rows=1 loops=174715)
 Index Cond: (i.inventory = outer.inventory)
 -  Index Scan using i_bins1 on _bins b  (cost=0.00..3.37 

Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32

2004-11-23 Thread Bruce Momjian
Dave Page wrote:
  
 
  -Original Message-
  From: Bruce Momjian [mailto:[EMAIL PROTECTED] 
  Sent: 23 November 2004 15:06
  To: Dave Page
  Cc: Merlin Moncure; [EMAIL PROTECTED]; 
  PostgreSQL Win32 port list
  Subject: Re: [pgsql-hackers-win32] scalability issues on win32
  
  The general opinion of server users is that you need 2-4 more 
  Win32 servers to do the same work as one Unix-like server.  
  That and the difficulty of automated administration and 
  security problems is what is preventing Win32 from making 
  greater inroads into the server marketplace.
  
  Of course these are just generalizations.
 
 I'd rather avoid an OS advocacy war here, but if I'm honest, with group
 policy and other tools such as SUS, I find that my Windows servers are
 actually easier to administer than the Linux ones (I have about a 50-50
 mix at work). Perhaps that's because I favour Slackware though?
 
 As for the 2-4 servers quote, I find that a little on the high side. I
 agree that generally you might expect a little more performance from an
 equivalent Linux system on the same hardware, but in my practical
 experience the difference is far less than you suggest.

I have never run the tests myself. I am just quoting what I have heard,
and maybe that information is a few years old.

-- 
  Bruce Momjian|  http://candle.pha.pa.us
  [EMAIL PROTECTED]   |  (610) 359-1001
  +  If your life is a hard drive, |  13 Roberts Road
  +  Christ can be your backup.|  Newtown Square, Pennsylvania 19073

---(end of broadcast)---
TIP 4: Don't 'kill -9' the postmaster


Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Jaime Casanova
 --- Mike Mascari [EMAIL PROTECTED] escribió: 
 Tom Lane wrote:
  Mike Mascari [EMAIL PROTECTED] writes:
  
 When I query the view with a simple filter, I get:
  
  
 explain analyze select * from p_areas where
 deactive is null;
  
  
  The problem seems to be here:
  
  
 -  Seq Scan on _areas a  (cost=0.00..2.48
 rows=1 width=163) (actual time=0.037..0.804 rows=48
 loops=1)
   Filter: (deactive IS NULL)
  
  
  Why is it so completely off about the selectivity
 of the IS NULL clause?

null values are not indexable, is that your question?
If it is your question then create a partial index
with where deactive is null.

regards,
Jaime Casanova

_
Do You Yahoo!?
Información de Estados Unidos y América Latina, en Yahoo! Noticias.
Visítanos en http://noticias.espanol.yahoo.com

---(end of broadcast)---
TIP 9: the planner will ignore your desire to choose an index scan if your
  joining column's datatypes do not match


Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Tom Lane
Jaime Casanova [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Why is it so completely off about the selectivity
 of the IS NULL clause?

 null values are not indexable, is that your question?

Uh, no.  The problem is that the IS NULL condition matched all 48 rows
of the table, but the planner thought it would only match one row.  This
is definitely covered by the pg_stats statistics, and with only 48 live
rows there couldn't possibly have been any sampling error, so what the
heck went wrong there?

regards, tom lane

---(end of broadcast)---
TIP 7: don't forget to increase your free space map settings


Re: [PERFORM] Slow execution time when querying view with WHERE clause

2004-11-23 Thread Tom Lane
Mike Mascari [EMAIL PROTECTED] writes:
 Tom Lane wrote:
 Why is it so completely off about the selectivity of the IS NULL clause?

 I think this is a bug in ANALYZE not constructing statistics for columns 
 whose data is entirely NULL:

Um ... doh ... analyze.c about line 1550:

/* We can only compute valid stats if we found some non-null values. */
if (nonnull_cnt  0)
   ...

There's a bit of an epistemological issue here: if we didn't actually
find any nonnull values in our sample, is it legitimate to assume that
the column is entirely null?  On the other hand, if we find only 3 in
our sample we will happily assume the column contains only 3, so I
dunno why we are discriminating against null.  This seems like a case
that just hasn't come up before.

regards, tom lane

---(end of broadcast)---
TIP 2: you can get off all lists at once with the unregister command
(send unregister YourEmailAddressHere to [EMAIL PROTECTED])


Re: [PERFORM] Data type to use for primary key

2004-11-23 Thread Greg Stark
Alexandre Leclerc [EMAIL PROTECTED] writes:

 Thanks for those tips. I'll print and keep them. So in my case, the
 product_code being varchar(24) is:
 4 bytes + string size (so possibly up to 24) = possible 28 bytes. I
 did the good thing using a serial. For my shorter keys (4 bytes + up
 to 6 char) I will use the natural key.

Realize that space usage is really only part of the issue.

If you ever have two records with the same natural key or a record whose
natural key has changed you'll be in for a world of hurt if you use the
natural key as the primary key in your database.

Basically I never use natural keys except when they're arbitrarily chosen
values defined by the application itself.

Situations where I've used varchars instead of integer keys are things like:

. Individual privileges grantable in a security system.
  (things like VIEWUSER EDITUSER privileges)

. Reference tables for one letter codes used to indicate the type of object
  represented by the record.

Actually I see one interesting exception to my policy in my current database
schema. And I don't think I would do this one differently given the choice
either. The primary key of the postal code table is the postal code. (postal
codes are up here in the great white north like zip codes down there.)

This could hurt if they ever reuse an old previously retired postal code,
which isn't an entirely impossible case. As far as I know it hasn't happened
yet though. And it's just so much more convenient having the postal code handy
instead of having to join against another table to look it up.

-- 
greg


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


[PERFORM] FW: Index usage

2004-11-23 Thread BBI Edwin Punzalan

Hi everyone,

Can anyone please explain postgres' behavior on our index.

I did the following query tests on our database:


db=# create index chatlogs_date_idx on chatlogs (date);
CREATE
db=# explain select date from chatlogs where date='11/23/04';
NOTICE:  QUERY PLAN:

Index Scan using chatlogs_date_idx on chatlogs  (cost=0.00..144.11 rows=36
width=4)

EXPLAIN
db=# explain select date from chatlogs where date='10/23/04';
NOTICE:  QUERY PLAN:

Seq Scan on chatlogs  (cost=0.00..23938.06 rows=253442 width=4)

EXPLAIN

Date's datatype is date.  Its just odd that I just change the actual date of
search and the index is not being used anymore.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org


Re: [PERFORM] FW: Index usage

2004-11-23 Thread Leeuw van der, Tim
Well you just selected a whole lot more rows... What's the total number of rows 
in the table?

In general, what I remember from reading on the list, is that when there's no 
upper bound on a query like this, the planner is more likely to choose a seq. 
scan than an index scan.
Try to give your query an upper bound like:

select date from chatlogs where date='11/23/04' and date  '12/31/99';

select date from chatlogs where date='10/23/04' and date  '12/31/99';

This should make it easier for the planner to give a proper estimate of the 
number of rows returned. If it doesn't help yet, please post 'explain analyze' 
output rather than 'explain' output, for it allows much better investigation 
into why the planner chooses what it chooses.

cheers,

--Tim


-Original Message-
From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of BBI Edwin Punzalan
Sent: Wednesday, November 24, 2004 7:52 AM
To: [EMAIL PROTECTED]
Subject: [PERFORM] FW: Index usage



Hi everyone,

Can anyone please explain postgres' behavior on our index.

I did the following query tests on our database:


db=# create index chatlogs_date_idx on chatlogs (date);
CREATE
db=# explain select date from chatlogs where date='11/23/04';
NOTICE:  QUERY PLAN:

Index Scan using chatlogs_date_idx on chatlogs  (cost=0.00..144.11 rows=36
width=4)

EXPLAIN
db=# explain select date from chatlogs where date='10/23/04';
NOTICE:  QUERY PLAN:

Seq Scan on chatlogs  (cost=0.00..23938.06 rows=253442 width=4)

EXPLAIN

Date's datatype is date.  Its just odd that I just change the actual date of
search and the index is not being used anymore.


---(end of broadcast)---
TIP 6: Have you searched our list archives?

   http://archives.postgresql.org

---(end of broadcast)---
TIP 3: if posting/reading through Usenet, please send an appropriate
  subscribe-nomail command to [EMAIL PROTECTED] so that your
  message can get through to the mailing list cleanly