Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32
Reini Urban wrote: Merlin Moncure schrieb: A good benchmark of our application performance is the time it takes to read the entire bill of materials for a product. This is a recursive read of about 2500 records in the typical case (2408 in the test case). I always knew that COBOL ultimativly looses, but it's always refreshing to get confirmation from time to time :) Heh. It's important to make the distinction between COBOL, which is just a language, and ISAM, which is a data delivery system. You could, for example, pair COBOL with SQL with good results, (in fact, we plan to). But yes, many legacy COBOL apps were written with assumptions about the system architecture that are no longer valid. Where did you get the win32 avg cpu load number from? AFAIK there's no getloadavg() for windows. At least I tried hard to find one, because I want to add a comparable figure to cygwin core. emacs, coreutils, make and others would need desperately need it, not to speak of servers and real-time apps. I just eyeballed it :-). So consider the load averages anecdotal, although they are quite stable. However it is quite striking that with the same application code the win32 load average was 2-3 times higher. I also left out the dual processor results, because I did not have time to test them on linux. However, sadly the 2nd processor adds very little extras horsepower to the server. I'm hoping linux will be better. Merlin ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32
This was an intersting Win32/linux comparison. I expected Linux to scale better, but I was surprised how poorly XP scaled. It reinforces our perception that Win32 is for low traffic servers. That's a bit harsh given the lack of any further investigation so far isn't it? Win32 can run perfectly well with other DBMSs with hundreds of users. The general opinion of server users is that you need 2-4 more Win32 servers to do the same work as one Unix-like server. That and the difficulty of automated administration and security problems is what is preventing Win32 from making greater inroads into the server marketplace. Of course these are just generalizations. Is this for Postgresql Cygwin? You surely can't mean for all server tasks - if so, I would say that's *way* off. There is a difference, but it's more along the line of single-digit percentage in my experience - provided you config your machines reasonably, of course. (In my experience, Win32 MSSQLServer often outperforms postgresql on Linux. Granted you can tweak postgresql up to higher speeds, but MS does most of that tweaking automatically... Talking of tweaking a lot more specific than just raising the memory limits from the installation default, of course) I do agree on the automated administration though... It's a major PITA. //Magnus ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Data type to use for primary key
On Mon, 22 Nov 2004 16:54:56 -0800, Josh Berkus [EMAIL PROTECTED] wrote: Alexandre, What is the common approach? Should I use directly the product_code as my ID, or use a sequantial number for speed? (I did the same for the company_id, this is a 'serial' and not the shor name of the customer. I just don't know what is usually done. Don't use SERIAL just because it's there.Ideally, you *want* to use the product_code if you can. It's your natural key and a natural key is always superior to a surrogate key all other things being equal. Unfortunately, all other things are NOT equal.Here's the reasons why you'd use a surrogate key (i.e. SERIAL): 1) because the product code is a large text string (i.e. 10bytes) and you will have many millions of records, so having it as an FK in other tables will add significantly to the footprint of the database; Thanks for those tips. I'll print and keep them. So in my case, the product_code being varchar(24) is: 4 bytes + string size (so possibly up to 24) = possible 28 bytes. I did the good thing using a serial. For my shorter keys (4 bytes + up to 6 char) I will use the natural key. This is interesting, because this is what I did right now. The transparent surrogate keying proposal that is discussed bellow in the thread is a very good idea. It would be nice to see that. It would be easier for the DB admin and the coder; the moment this is not slowing the system. : ) Best regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Data type to use for primary key
All, Well, you should still escape any strings you're getting from a web page so you can ensure you're not subject to a SQL insert attack, even if you're expecting integers. Thanks, Peter Darley Well, your framework should do this for you : integer specified in your database object class description %d appears in in your generated queries (or you put it in your hand written queries) = if the parameter is not an integer, an exception is thrown, then catched, then an error page is displayed... Or, just casting to int should throw an exception... Forms should be validated, but hidden parameters in links are OK imho to display an error page if they are incorrect, after all, if the user edits the get or post parameters, well... ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32
Is this for Postgresql Cygwin? You surely can't mean for all server tasks - if so, I would say that's *way* off. There is a difference, but it's more along the line of single-digit percentage in my experience - provided you config your machines reasonably, of course. (In my experience, Win32 MSSQLServer often outperforms postgresql on Linux. Granted you can tweak postgresql up to higher speeds, but MS does most of that tweaking automatically... Talking of tweaking a lot more specific than just raising the memory limits from the installation default, of course) I agree with Magnus. Specifically, I suspect there is some sort of resource contention going on that is driving up the cpu load when the queries follow certain patterns. This resource contention could be happening in the win32 port code (likely ipc), the mingw api, or inside the o/s itself. Other servers, namely apache, sql server and a host of others do not have this problem. Merlin ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Some quick Opteron 32-bit/64-bit results
I ran quite a few file system benchmarks in RHAS x86-64 and FC2 x86-64 on a Sun V40z - I did see very consistent 50% improvements in bonnie++ moving from RHAS to FC2 with ext2/ext3 on SAN. On Sun, 2004-11-14 at 23:51 -0800, William Yu wrote: Greg Stark wrote: William Yu [EMAIL PROTECTED] writes: Biggest speedup I've found yet is the backup process (PG_DUMP -- GZIP). 100% faster in 64-bit mode. This drastic speed might be more the result of 64-bit GZIP though as I've seen benchmarks in the past showing encryption/compression running 2 or 3 times faster in 64-bit mode versus 32-bit. Isn't this a major kernel bump too? So a different scheduler, different IO scheduler, etc? I'm sure there's some speedup due to the kernel bump. I really didn't have the patience to even burn the FC2 32-bit CDs much less install both 32-bit 64-bit FC2 in order to have a more accurate baseline comparison. However, that being said -- when you see huge speed increases like 50% 100% for dump+gzip, it's doubtful the kernel/process scheduler/IO scheduler could have made that drastic of a difference. Maybe somebody else who has done a 2.4 - 2.6 upgrade can give us a baseline to subtract from my numbers. ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED] ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Mike Mascari [EMAIL PROTECTED] writes: When I query the view with a simple filter, I get: explain analyze select * from p_areas where deactive is null; The problem seems to be here: - Seq Scan on _areas a (cost=0.00..2.48 rows=1 width=163) (actual time=0.037..0.804 rows=48 loops=1) Filter: (deactive IS NULL) Why is it so completely off about the selectivity of the IS NULL clause? Are you sure you ANALYZEd this table recently? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] memcached and PostgreSQL
My point was that there are two failure cases --- one where the cache is slightly out of date compared to the db server --- these are cases where the cache update is slightly before/after the commit. I was thinking about this and ways to minimize this even further. Have memcache clients add data and have a policy to have the database only delete data. This sets the database up as the bottleneck again, but then you have a degree of transactionality that couldn't be previously achieved with the database issuing replace commands. For example: 1) client checks the cache for data and gets a cache lookup failure 2) client beings transaction 3) client SELECTs data from the database 4) client adds the key to the cache 5) client commits transaction This assumes that the client won't rollback or have a transaction failure. Again, in 50M transactions, I doubt one of them would fail (sure, it's possible, but that's a symptom of bigger problems: memcached isn't an RDBMS). The update case being: 1) client begins transaction 2) client updates data 3) database deletes record from memcache 4) client commits transaction 5) client adds data to memcache The second is where the cache update happens and the commit later fails, or the commit happens and the cache update never happens. Having pgmemcache delete, not replace data addresses this second issue. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Tom Lane wrote: Mike Mascari [EMAIL PROTECTED] writes: When I query the view with a simple filter, I get: explain analyze select * from p_areas where deactive is null; The problem seems to be here: - Seq Scan on _areas a (cost=0.00..2.48 rows=1 width=163) (actual time=0.037..0.804 rows=48 loops=1) Filter: (deactive IS NULL) Why is it so completely off about the selectivity of the IS NULL clause? Are you sure you ANALYZEd this table recently? Yes. I just did: [EMAIL PROTECTED] vacuum full analyze; VACUUM [EMAIL PROTECTED] explain analyze select * from p_areas where deactive is null; QUERY PLAN -- Nested Loop IN Join (cost=8.62..512.47 rows=1 width=162) (actual time=1143.969..21811.417 rows=37 loops=1) Join Filter: (outer.area = inner.area) - Seq Scan on _areas a (cost=0.00..2.49 rows=1 width=162) (actual time=0.037..1.673 rows=49 loops=1) Filter: (deactive IS NULL) - Nested Loop (cost=8.62..25740.20 rows=2681 width=8) (actual time=1.172..429.501 rows=3566 loops=49) - Nested Loop (cost=8.62..16674.93 rows=2680 width=8) (actual time=1.125..281.570 rows=3566 loops=49) - Merge Join (cost=8.62..3012.72 rows=2778 width=8) (actual time=0.876..128.908 rows=3566 loops=49) Merge Cond: (outer.pricegroup = inner.pricegroup) - Nested Loop IN Join (cost=8.62..1929.41 rows=9 width=8) (actual time=0.613..5.504 rows=9 loops=49) Join Filter: (outer.buyer = inner.store) - Index Scan using i_pricemembers3 on _pricemembers p (cost=0.00..11.13 rows=217 width=16) (actual time=0.403..1.476 rows=142 loops=49) - Subquery Scan IN_subquery (cost=8.62..8.74 rows=8 width=8) (actual time=0.013..0.019 rows=1 loops=6950) - Unique (cost=8.62..8.66 rows=8 width=8) (actual time=0.007..0.010 rows=1 loops=6950) - Sort (cost=8.62..8.64 rows=8 width=8) (actual time=0.003..0.004 rows=1 loops=6950) Sort Key: store - Append (cost=2.87..8.50 rows=8 width=8) (actual time=8.394..8.446 rows=1 loops=1) - Subquery Scan *SELECT* 1 (cost=2.87..5.17 rows=5 width=8) (actual time=8.112..8.112 rows=0 loops=1) - Hash Join (cost=2.87..5.12 rows=5 width=8) (actual time=8.106..8.106 rows=0 loops=1) Hash Cond: (outer.company = inner.company) - Seq Scan on _stores s (cost=0.00..2.13 rows=13 width=16) (actual time=0.014..0.052 rows=13 loops=1) - Hash (cost=2.87..2.87 rows=1 width=8) (actual time=7.878..7.878 rows=0 loops=1) - Seq Scan on _webusers w (cost=0.00..2.87 rows=1 width=8) (actual time=7.868..7.868 rows=0 loops=1) Filter: (webuser = getwebuser()) - Subquery Scan *SELECT* 2 (cost=1.08..3.33 rows=3 width=8) (actual time=0.273..0.322 rows=1 loops=1) - Hash Join (cost=1.08..3.30 rows=3 width=8) (actual time=0.263..0.308 rows=1 loops=1) Hash Cond: (outer.company = inner.company) - Seq Scan on _stores s (cost=0.00..2.13 rows=13 width=16) (actual time=0.008..0.042 rows=13 loops=1) - Hash (cost=1.07..1.07 rows=1 width=8) (actual time=0.093..0.093 rows=0 loops=1) - Seq Scan on _companies c (cost=0.00..1.07 rows=1 width=8) (actual time=0.061..0.081 rows=1 loops=1) Filter: ((companyid)::text = 'DEFAULT'::text) - Index Scan using i_offers4 on _offers o (cost=0.00..1014.76 rows=16298 width=16) (actual time=0.244..72.742 rows=10433 loops=49) - Index Scan using i_inventories1 on _inventories i (cost=0.00..4.91 rows=1 width=16) (actual time=0.025..0.029 rows=1 loops=174715) Index Cond: (i.inventory = outer.inventory) - Index Scan using i_bins1 on _bins b (cost=0.00..3.37
Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32
Dave Page wrote: -Original Message- From: Bruce Momjian [mailto:[EMAIL PROTECTED] Sent: 23 November 2004 15:06 To: Dave Page Cc: Merlin Moncure; [EMAIL PROTECTED]; PostgreSQL Win32 port list Subject: Re: [pgsql-hackers-win32] scalability issues on win32 The general opinion of server users is that you need 2-4 more Win32 servers to do the same work as one Unix-like server. That and the difficulty of automated administration and security problems is what is preventing Win32 from making greater inroads into the server marketplace. Of course these are just generalizations. I'd rather avoid an OS advocacy war here, but if I'm honest, with group policy and other tools such as SUS, I find that my Windows servers are actually easier to administer than the Linux ones (I have about a 50-50 mix at work). Perhaps that's because I favour Slackware though? As for the 2-4 servers quote, I find that a little on the high side. I agree that generally you might expect a little more performance from an equivalent Linux system on the same hardware, but in my practical experience the difference is far less than you suggest. I have never run the tests myself. I am just quoting what I have heard, and maybe that information is a few years old. -- Bruce Momjian| http://candle.pha.pa.us [EMAIL PROTECTED] | (610) 359-1001 + If your life is a hard drive, | 13 Roberts Road + Christ can be your backup.| Newtown Square, Pennsylvania 19073 ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] Slow execution time when querying view with WHERE clause
--- Mike Mascari [EMAIL PROTECTED] escribió: Tom Lane wrote: Mike Mascari [EMAIL PROTECTED] writes: When I query the view with a simple filter, I get: explain analyze select * from p_areas where deactive is null; The problem seems to be here: - Seq Scan on _areas a (cost=0.00..2.48 rows=1 width=163) (actual time=0.037..0.804 rows=48 loops=1) Filter: (deactive IS NULL) Why is it so completely off about the selectivity of the IS NULL clause? null values are not indexable, is that your question? If it is your question then create a partial index with where deactive is null. regards, Jaime Casanova _ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Jaime Casanova [EMAIL PROTECTED] writes: Tom Lane wrote: Why is it so completely off about the selectivity of the IS NULL clause? null values are not indexable, is that your question? Uh, no. The problem is that the IS NULL condition matched all 48 rows of the table, but the planner thought it would only match one row. This is definitely covered by the pg_stats statistics, and with only 48 live rows there couldn't possibly have been any sampling error, so what the heck went wrong there? regards, tom lane ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Mike Mascari [EMAIL PROTECTED] writes: Tom Lane wrote: Why is it so completely off about the selectivity of the IS NULL clause? I think this is a bug in ANALYZE not constructing statistics for columns whose data is entirely NULL: Um ... doh ... analyze.c about line 1550: /* We can only compute valid stats if we found some non-null values. */ if (nonnull_cnt 0) ... There's a bit of an epistemological issue here: if we didn't actually find any nonnull values in our sample, is it legitimate to assume that the column is entirely null? On the other hand, if we find only 3 in our sample we will happily assume the column contains only 3, so I dunno why we are discriminating against null. This seems like a case that just hasn't come up before. regards, tom lane ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Data type to use for primary key
Alexandre Leclerc [EMAIL PROTECTED] writes: Thanks for those tips. I'll print and keep them. So in my case, the product_code being varchar(24) is: 4 bytes + string size (so possibly up to 24) = possible 28 bytes. I did the good thing using a serial. For my shorter keys (4 bytes + up to 6 char) I will use the natural key. Realize that space usage is really only part of the issue. If you ever have two records with the same natural key or a record whose natural key has changed you'll be in for a world of hurt if you use the natural key as the primary key in your database. Basically I never use natural keys except when they're arbitrarily chosen values defined by the application itself. Situations where I've used varchars instead of integer keys are things like: . Individual privileges grantable in a security system. (things like VIEWUSER EDITUSER privileges) . Reference tables for one letter codes used to indicate the type of object represented by the record. Actually I see one interesting exception to my policy in my current database schema. And I don't think I would do this one differently given the choice either. The primary key of the postal code table is the postal code. (postal codes are up here in the great white north like zip codes down there.) This could hurt if they ever reuse an old previously retired postal code, which isn't an entirely impossible case. As far as I know it hasn't happened yet though. And it's just so much more convenient having the postal code handy instead of having to join against another table to look it up. -- greg ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
[PERFORM] FW: Index usage
Hi everyone, Can anyone please explain postgres' behavior on our index. I did the following query tests on our database: db=# create index chatlogs_date_idx on chatlogs (date); CREATE db=# explain select date from chatlogs where date='11/23/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..144.11 rows=36 width=4) EXPLAIN db=# explain select date from chatlogs where date='10/23/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..23938.06 rows=253442 width=4) EXPLAIN Date's datatype is date. Its just odd that I just change the actual date of search and the index is not being used anymore. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] FW: Index usage
Well you just selected a whole lot more rows... What's the total number of rows in the table? In general, what I remember from reading on the list, is that when there's no upper bound on a query like this, the planner is more likely to choose a seq. scan than an index scan. Try to give your query an upper bound like: select date from chatlogs where date='11/23/04' and date '12/31/99'; select date from chatlogs where date='10/23/04' and date '12/31/99'; This should make it easier for the planner to give a proper estimate of the number of rows returned. If it doesn't help yet, please post 'explain analyze' output rather than 'explain' output, for it allows much better investigation into why the planner chooses what it chooses. cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of BBI Edwin Punzalan Sent: Wednesday, November 24, 2004 7:52 AM To: [EMAIL PROTECTED] Subject: [PERFORM] FW: Index usage Hi everyone, Can anyone please explain postgres' behavior on our index. I did the following query tests on our database: db=# create index chatlogs_date_idx on chatlogs (date); CREATE db=# explain select date from chatlogs where date='11/23/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..144.11 rows=36 width=4) EXPLAIN db=# explain select date from chatlogs where date='10/23/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..23938.06 rows=253442 width=4) EXPLAIN Date's datatype is date. Its just odd that I just change the actual date of search and the index is not being used anymore. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly