FW: [PERFORM] FW: Index usage
Thanks, Tim. I tried adding an upper limit and its still the same as follows: == db=# explain analyze select date from chatlogs where date='11/24/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 width=4) (actual time=0.18..239.69 rows=10737 loops=1) Total runtime: 246.22 msec EXPLAIN db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/24/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec EXPLAIN db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec == How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. -Original Message- From: Leeuw van der, Tim [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 3:35 PM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: RE: [PERFORM] FW: Index usage Well you just selected a whole lot more rows... What's the total number of rows in the table? In general, what I remember from reading on the list, is that when there's no upper bound on a query like this, the planner is more likely to choose a seq. scan than an index scan. Try to give your query an upper bound like: select date from chatlogs where date='11/23/04' and date '12/31/99'; select date from chatlogs where date='10/23/04' and date '12/31/99'; This should make it easier for the planner to give a proper estimate of the number of rows returned. If it doesn't help yet, please post 'explain analyze' output rather than 'explain' output, for it allows much better investigation into why the planner chooses what it chooses. cheers, --Tim -Original Message- From: [EMAIL PROTECTED] [mailto:[EMAIL PROTECTED] Behalf Of BBI Edwin Punzalan Sent: Wednesday, November 24, 2004 7:52 AM To: [EMAIL PROTECTED] Subject: [PERFORM] FW: Index usage Hi everyone, Can anyone please explain postgres' behavior on our index. I did the following query tests on our database: db=# create index chatlogs_date_idx on chatlogs (date); CREATE db=# explain select date from chatlogs where date='11/23/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..144.11 rows=36 width=4) EXPLAIN db=# explain select date from chatlogs where date='10/23/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..23938.06 rows=253442 width=4) EXPLAIN Date's datatype is date. Its just odd that I just change the actual date of search and the index is not being used anymore. ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Data type to use for primary key
Tom Lane [EMAIL PROTECTED] writes: Greg Stark [EMAIL PROTECTED] writes: This could hurt if they ever reuse an old previously retired postal code, which isn't an entirely impossible case. As far as I know it hasn't happened yet though. One would suppose that the guys who are in charge of this point at Canada Post consider the postal code to be their primary key, and are no more eager to reuse one than you are to see it reused. Well, they may eventually be forced to. For the same sort of hierarchic issue that causes the shortage of IPv4 address space even though there's far less than 4 billion hosts online. But as far as I can see today the only postal codes that are being retired are rural areas that are being developed and have blocks of codes assigned instead of having a single broad code. Basically this comes down to I'm going to use some externally supplied primary key as my primary key. Do I trust the upstream DBA to know what a primary key is? Well there's another issue here I think. Often people see something that looks unique and is clearly intended to be a primary key and think aha, nice primary key but they miss a subtle distinction between what the external primary key represents and what their data model is tracking. The typical example is social security numbers. SSNs are a perfectly reasonable primary key -- as long as you're tracking Social Security accounts, not people. Most people in the US have exactly one SS account, so people often think it looks like a primary key for people. In fact not everyone has a Social Security account (aliens who have never worked in the US, or for that matter people who have never been in the US) and others have had multiple Social Security accounts (victims of identity theft). Another example that comes to mind is the local telephone company. When I changed my phone number they created a new account without telling me, because their billing system's primary key for accounts is... the phone number. So all my automated bill payments started disappearing into the black hole of the old account and my new account went negative. I wonder what they do for customers who buy services from them but don't have a telephone line. -- greg ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] FW: Index usage
From: BBI Edwin Punzalan [EMAIL PROTECTED] db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec == How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] FW: Index usage
Yes, the database is being vacuum-ed and analyzed on a daily basis. Our version is 7.2.1 -Original Message- From: gnari [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 4:35 PM To: BBI Edwin Punzalan; [EMAIL PROTECTED] Subject: Re: [PERFORM] FW: Index usage From: BBI Edwin Punzalan [EMAIL PROTECTED] db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/25/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.45..4268.00 rows=23787 loops=1) Total runtime: 4282.81 msec == How come a query on the current date filter uses an index and the others does not? This makes indexing to speed up queries quite difficult. have you ANALYZED the table lately ? what version postgres are you using ? gnari ---(end of broadcast)--- TIP 8: explain analyze is your friend
[PERFORM] Unsubscribe
Unsubscribe ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: FW: [PERFORM] FW: Index usage
BBI Edwin Punzalan wrote: Thanks, Tim. I tried adding an upper limit and its still the same as follows: == db=# explain analyze select date from chatlogs where date='11/24/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 width=4) (actual time=0.18..239.69 rows=10737 loops=1) Total runtime: 246.22 msec EXPLAIN db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/24/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec We have two issues here 1. In the first example it only picks an index because it thinks it is going to get 37 rows, it actually gets 10737 2. It's taking 4455ms to run a seq-scan but only 246ms to run an index-scan over 10737 rows (and then fetch the rows too). Questions: 1. How many rows do you have in chatlogs? 2. Is this the only problem you are experiencing, or just one from many? 3. Have you tuned any configuration settings? e.g. as suggested in: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: FW: [PERFORM] FW: Index usage
Hi. 1) chatlogs rows increases every now and then (its in a live environment) and currently have 538,696 rows 2) this is the only problem we experienced. So far, all our other indexes are being used correctly. 3) I don't remember tuning any post-installation configuration of our postgreSQL except setting fsync to false. Thanks for taking a look at our problem. :D -Original Message- From: Richard Huxton [mailto:[EMAIL PROTECTED] Sent: Wednesday, November 24, 2004 6:17 PM To: BBI Edwin Punzalan Cc: [EMAIL PROTECTED] Subject: Re: FW: [PERFORM] FW: Index usage BBI Edwin Punzalan wrote: Thanks, Tim. I tried adding an upper limit and its still the same as follows: == db=# explain analyze select date from chatlogs where date='11/24/04'; NOTICE: QUERY PLAN: Index Scan using chatlogs_date_idx on chatlogs (cost=0.00..145.72 rows=37 width=4) (actual time=0.18..239.69 rows=10737 loops=1) Total runtime: 246.22 msec EXPLAIN db=# explain analyze select date from chatlogs where date='11/23/04' and date'11/24/04'; NOTICE: QUERY PLAN: Seq Scan on chatlogs (cost=0.00..24763.19 rows=9200 width=4) (actual time=0.44..4447.01 rows=13029 loops=1) Total runtime: 4455.56 msec We have two issues here 1. In the first example it only picks an index because it thinks it is going to get 37 rows, it actually gets 10737 2. It's taking 4455ms to run a seq-scan but only 246ms to run an index-scan over 10737 rows (and then fetch the rows too). Questions: 1. How many rows do you have in chatlogs? 2. Is this the only problem you are experiencing, or just one from many? 3. Have you tuned any configuration settings? e.g. as suggested in: http://www.varlena.com/varlena/GeneralBits/Tidbits/perf.html -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Postgres vs. MySQL
Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. First time(s) I tried the Postgres interface that was already installed for other applications. Whenever I begin to train and/or filter messages throug dspam the performance is incredibly bad. First messages are ok but soon the filter time begins to increase to about 30 seconds or more! ...so I looked for some optimization both for the linux kernel and the postgres server. Nothing has work for me. I always have the same behavior. For isolation purposes I started using another server just to hold the dspam database and nothing else. No matter what I do: postgres gets slower and slower with each new message fed or filtered. Several strategies have failed: newest RPMs from postgresql.org, pg_autovacuum, etc. I finally tried the MySQL driver. I have started using this tool right now for dspam, so I am a newcomer in MySQL. The result: after some preparation in configuring some parameters for mysqld (with the QuickStart Guide from mysql.com) all works fine! It's incredible! the same servers, the same messages, the same dspam compilation (well each one with the corresponding --with-storage-driver=*sql_drv). Postgres is getting worst than 30s/message and MySQL process the same in less than a second. I can surrender the Postgres server by just corpus-feeding one single short message to each user (it takes hours to process 300 users!). On the other hand, MySQL only takes a few minutes to process the same batch. I do not want to make flame over Postgres (I have always prefered it for its capabilities) but I am absolutely impressed by MySQL (I have seen the light!) Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? TIA. Cheers, Evilio Jose del Rio Silvan Centre Mediterrani d'Investigacions [EMAIL PROTECTED]Marines i Ambientals Microsoft sells you Windows, Linux gives you the whole house - Anonymous ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] Postgres vs. MySQL
On Wed, Nov 24, 2004 at 02:14:18PM +0100, Evilio del Rio wrote: It's incredible! the same servers, the same messages, the same dspam compilation (well each one with the corresponding --with-storage-driver=*sql_drv). Postgres is getting worst than 30s/message and MySQL process the same in less than a second. AFAIK dspam is heavily optimized for MySQL and not optimized for PostgreSQL at all; I believe there would be significant performance boosts available by fixing dspam. Example queries that are slow, as well as table schemas, would probably help a lot in tracking down the problems. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] Postgres vs. MySQL
Evilio del Rio wrote: Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? Do you have any record of configuration, system hardware, usage patterns, queries executed? If you can tell us what you tried (and why) then we might be able to help, otherwise there's not much information here. -- Richard Huxton Archonet Ltd ---(end of broadcast)--- TIP 6: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Postgres vs. MySQL
As for performance, lots of others will probably volunteer tips and techniques. In my experience, properly written and tuned applications will show only minor speed differences. I have seen several open-source apps that support postgres but are not well tested on it. Query optimization can cause orders of magnitude performance differences. It sounds maybe dspam is in this bucket? Please, could anyone explain me this difference? Is Postgres that bad? Is MySQL that good? Am I the only one to have observed this behavior? I made a little chart about these about a year ago: http://www.tikipro.org/wiki/index.php?page=DatabaseComparison If speed is what you need, and data integrity / safety is not, then MySQL may be a good choice. (Aggregate statistics tables and other such calculated denormalizations). IMHO, if all you need is dpsam running *now*, then I'd say MySQL might be good choice. If you ever need to run a DB application where data integrity is mission critical, then postgres is the top of my list. [ \ / [ X Christian Fowler | spider AT viovio.com [ / \ http://www.viovio.com | http://www.tikipro.org ---(end of broadcast)--- TIP 2: you can get off all lists at once with the unregister command (send unregister YourEmailAddressHere to [EMAIL PROTECTED])
Re: [PERFORM] Data type to use for primary key
On 24 Nov 2004 01:52:52 -0500, Greg Stark [EMAIL PROTECTED] wrote: Alexandre Leclerc [EMAIL PROTECTED] writes: Thanks for those tips. I'll print and keep them. So in my case, the product_code being varchar(24) is: 4 bytes + string size (so possibly up to 24) = possible 28 bytes. I did the good thing using a serial. For my shorter keys (4 bytes + up to 6 char) I will use the natural key. Realize that space usage is really only part of the issue. Thank you for this additionnal information. This will help out in the futur. In my situation this is a good thing to have integer key where I decided to have them. Event if I was obliged to add UNIQUE constraints to some other columns. I think they call this candidate key and it's still 3NF (whatever; but only if my db is correctly organised)... I try to be logical and efficient for good performance. But in the end, the time (the db will get bigger) and good EXPLAIN ANALYSE commands will help fine tuning later! This will give me more experience at that point. Actually I see one interesting exception to my policy in my current database schema. And I don't think I would do this one differently given the choice either. The primary key of the postal code table is the postal code. (postal codes are up here in the great white north like zip codes down there.) (I do understand this one, living in the province of Quebec. ;) And the great white north is still not arrived; end november! - Still, not very exceptionnal.) Regards. -- Alexandre Leclerc ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] Postgres vs. MySQL
Evilio del Rio wrote: Hi, I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. The problem is definately dspam. We have been through their code. The new version is much, much better than the older one but I am sure there is more work to be done. The first version we installed suffered from a well known problem: It would use smallint/bigint but would not cast or quote the where clauses and thus PostgreSQL would never use the indexes. It was also missing several indexes on appropriate columns. We offered some advice and we know that some of it was taken but we don't know which. Sincerely, Joshua D. Drake -- Command Prompt, Inc., home of Mammoth PostgreSQL - S/ODBC and S/JDBC Postgresql support, programming shared hosting and dedicated hosting. +1-503-667-4564 - [EMAIL PROTECTED] - http://www.commandprompt.com PostgreSQL Replicator -- production quality replication for PostgreSQL begin:vcard fn:Joshua Drake n:Drake;Joshua org:Command Prompt, Inc. adr:;;PO Box 215 ;Cascade Locks;OR;97014;US email;internet:[EMAIL PROTECTED] title:Consultant tel;work:503-667-4564 tel;fax:503-210-0334 x-mozilla-html:FALSE url:http://www.commandprompt.com version:2.1 end:vcard ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] memcached and PostgreSQL
If instead of a select you do a select for update I think this would be transaction safe. Nothing would be able to modify the data in the database between when you do the SELECT and when you commit. If the transaction fails the value in memcached will be correct. Also, it's not clear if you're doing an update here or not... If you're doing an update then this wouldn't work. You'd want to do your update, then re-insert the value into memcached outside of the update transaction. On Tue, Nov 23, 2004 at 02:20:34PM -0800, Sean Chittenden wrote: My point was that there are two failure cases --- one where the cache is slightly out of date compared to the db server --- these are cases where the cache update is slightly before/after the commit. I was thinking about this and ways to minimize this even further. Have memcache clients add data and have a policy to have the database only delete data. This sets the database up as the bottleneck again, but then you have a degree of transactionality that couldn't be previously achieved with the database issuing replace commands. For example: 1) client checks the cache for data and gets a cache lookup failure 2) client beings transaction 3) client SELECTs data from the database 4) client adds the key to the cache 5) client commits transaction This assumes that the client won't rollback or have a transaction failure. Again, in 50M transactions, I doubt one of them would fail (sure, it's possible, but that's a symptom of bigger problems: memcached isn't an RDBMS). The update case being: 1) client begins transaction 2) client updates data 3) database deletes record from memcache 4) client commits transaction 5) client adds data to memcache The second is where the cache update happens and the commit later fails, or the commit happens and the cache update never happens. Having pgmemcache delete, not replace data addresses this second issue. -sc -- Sean Chittenden ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly -- Jim C. Nasby, Database Consultant [EMAIL PROTECTED] Give your computer some brain candy! www.distributed.net Team #1828 Windows: Where do you want to go today? Linux: Where do you want to go tomorrow? FreeBSD: Are you guys coming, or what? ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Postgres vs. MySQL
On Wed, Nov 24, 2004 at 09:57:52AM -0500, Christian Fowler wrote: As for performance, lots of others will probably volunteer tips and techniques. In my experience, properly written and tuned applications will show only minor speed differences. I have seen several open-source apps that support postgres but are not well tested on it. Query optimization can cause orders of magnitude performance differences. Definitely. My favourite is Request Tracker (we use 2.x, although 3.x is the latest version), which used something like 5-600 queries (all seqscans since the database schema only had an ordinary index on the varchar fields in question, and the queries were automatically searching on LOWER(field) to emulate MySQL's case-insensitivity on varchar fields) for _every_ page shown. Needless to say, the web interface was dog slow -- some index manipulation and a few bugfixes (they had some kind of cache layer which would eliminate 98% of the queries, but for some reason was broken for non-MySQL databases) later, and we were down to 3-4 index scans, a few orders of magnitude faster. :-) /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
Re: [PERFORM] [PERFORMANCE] Big number of schemas (3500) into a single database
--- Tom Lane [EMAIL PROTECTED] escribió: Constantin Teodorescu [EMAIL PROTECTED] writes: If I will choose to keep a mirror of every workstation database in a separate schema in the central database that mean that I will have 3500 different schemas. Is there any limit or any barrier that could stop this kind of approach or make things go slower? Would you need to put them all into search_path at once? I'm not sure what the scaling issues might be for long search_paths, but I wouldn't be surprised if it's bad. But as long as you don't do that, I don't believe there will be any problems. if i do a select with fully qualified table names it will search in the search_path or it will go directly to the schema? Just for know. regards, Jaime Casanova _ Do You Yahoo!? Información de Estados Unidos y América Latina, en Yahoo! Noticias. Visítanos en http://noticias.espanol.yahoo.com ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
[PERFORM] Group By index usage
I have a table with this index: create index ARTISTS_NAME on ARTISTS ( lower(AR_NAME) ); Te index is over a colum with this definition: AR_NAME VARCHAR(256) null, I want to optimize this query: select * from artists where lower(ar_name) like lower('a%') order by lower(ar_name) limit 20; I think the planner should use the index i have. But the result of the explain command is: explain analyze select * from artists where lower(ar_name) like lower('a%') order by lower(ar_name) limit 20; QUERY PLAN Limit (cost=20420.09..20420.14 rows=20 width=360) (actual time=2094.13..2094.19 rows=20 loops=1) - Sort (cost=20420.09..20433.52 rows=5374 width=360) (actual time=2094.13..2094.16 rows=21 loops=1) Sort Key: lower((ar_name)::text) - Index Scan using artists_name on artists (cost=0.00..19567.09 rows=5374 width=360) (actual time=0.11..1391.97 rows=59047 loops=1) Index Cond: ((lower((ar_name)::text) = 'a'::text) AND (lower((ar_name)::text) 'b'::text)) Filter: (lower((ar_name)::text) ~~ 'a%'::text) Total runtime: 2098.62 msec (7 rows) The ORDER BY clause is not using the index!. I don't know why. I have the locale configured to C, and the index works well with the like operator. ¿Could you help me? I am really lost. __ Renovamos el Correo Yahoo!: ¡100 MB GRATIS! Nuevos servicios, más seguridad http://correo.yahoo.es ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Group By index usage
sdfasdfas sdfasdfs [EMAIL PROTECTED] writes: I have a table with this index: create index ARTISTS_NAME on ARTISTS ( lower(AR_NAME) ); Te index is over a colum with this definition: AR_NAME VARCHAR(256) null, I want to optimize this query: select * from artists where lower(ar_name) like lower('a%') order by lower(ar_name) limit 20; I think the planner should use the index i have. Update to 7.4, or declare the column as TEXT instead of VARCHAR. Older versions aren't very bright about situations involving implicit coercions. regards, tom lane ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] FW: Index usage
From: BBI Edwin Punzalan [EMAIL PROTECTED] Yes, the database is being vacuum-ed and analyzed on a daily basis. then you should consider increating the statistics on the date column, as the estimates were a bit off in the plan Our version is 7.2.1 upgrade time ? gnari ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Slow execution time when querying view with WHERE clause
Tom Lane wrote: Um ... doh ... analyze.c about line 1550: /* We can only compute valid stats if we found some non-null values. */ if (nonnull_cnt 0) ... There's a bit of an epistemological issue here: if we didn't actually find any nonnull values in our sample, is it legitimate to assume that the column is entirely null? On the other hand, if we find only 3 in our sample we will happily assume the column contains only 3, so I dunno why we are discriminating against null. This seems like a case that just hasn't come up before. Will this discriminatory policy toward null end for 8.0? Mike Mascari ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] Postgres vs. MySQL
On Wed, Nov 24, 2004 at 02:14:18PM +0100, Evilio del Rio wrote: I have installed the dspam filter (http://www.nuclearelephant.com/projects/dspam) on our mail server (RedHat 7.3 Linux with sendmail 8.13 and procmail). I have ~300 users with a quite low traffic of 4000 messages/day. So it's a quite common platform/environment, nothing spectacular. We just had a case just like this on #postgresql. The (somewhat surprising) solution was increasing the statistics target on the token column to something like 200, which makes the planner choose an index scan instead of a sequential scan. For the people who did not follow the case: The culprit is a query like SELECT * FROM table WHERE token IN ('346369873476346', '4376376034', ...) (token is a numeric(20,0)) With one entry in the IN (), the cost of an index scan was estimated to 4.77; with ten entries, it was about 48, but with 574 entries the estimated cost was 513565 (!!), making the planner prefer an index scan to 574 consecutive index scans. Upping the statistics target made the planner estimate the cost to about ~4000, and thus select the index scan, which was two orders of magnitude faster. BTW, this case was with PostgreSQL 7.4.6, not 7.3 as the poster here is reporting. /* Steinar */ -- Homepage: http://www.sesse.net/ ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
[PERFORM] Hardware purchase question
We currently are utilizing postgresql on 2 servers with the following configuration: 2 - 2.4 Ghz Xeon processors 4GB ram 4 36gb 1rpm scsi drives configured for raid 10 We started out with one server and as we became IO bound we added the second. We are currently considering purchasing another identical server to go along with these. In addition to this we are also considering a scsi attached storage device in the 10 - 14 drive range configured for raid 10 in place of the onboard 4 drives we currently have. Daily about 30% of our data gets updated with about 2% new data. Our query load is about 60% reads and 40% writes currently. My question is what type of performance gains can I expect on average from swapping from 4 disk raid 10 to 14 disk raid 10? Could I expect to see 40 - 50% better throughput. The servers listed above are the dell 2650's which have perc 3 controllers. I have seen on this list where they are know for not performing well. So any suggestions for an attached scsi device would be greatly appreciated. Also, any thoughts on fibre channel storage devices? Thank You, Bo Stewart ---(end of broadcast)--- TIP 3: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] [pgsql-hackers-win32] scalability issues on win32
Merlin Moncure schrieb: Following is the promised writeup in performance related issues comparing win32 with linux x86 and linux x86-64. Unfortunately, the 64 bit portion of the test is not yet completed and won't be for a bit. However there are some telling things about the win32/linux comparison. If you are considering deploying postgres in a win32 environment read on... First a description of the client app: Our company develops an ERP/CRM written in cobol which we are porting to run on PostgreSQL. Part of this porting effort was development of an ISAM 'driver' for our app to allow it to store/fetch data from the database in place of a traditional file system, which is complete. For those of you unfamiliar with COBOL/ISAM, applications written with it have a 'one record at a time' mentality, such the application tends to spam the server with queries of the select * from t where k = k1 variety. Our driver creates stored procedures as necessary and uses ExecParams wherever possible to cut down on server CPU time, which is a precious resource in our case. Over time we plan to gradually redeploy our application logic to make better use of the sql server's server side power. Our application is very rarely i/o bound because we will make sure our server has enough memory so that the data will be rarely, if ever, *not* run from the cache. A good benchmark of our application performance is the time it takes to read the entire bill of materials for a product. This is a recursive read of about 2500 records in the typical case (2408 in the test case). I always knew that COBOL ultimativly looses, but it's always refreshing to get confirmation from time to time :) Test platform: Pentium 4 3.06 GHz/HT 10k SATA Raptor 1Gb memory Windows XP Pro SP2/Redhat Fedora 3 (64 bit results coming soon) BOM traversal for product * (1 user): win32: runtime: 3.34 sec avg cpu load: 60% redhat: runtime: 3.46 sec avg cpu load: 20% Where did you get the win32 avg cpu load number from? AFAIK there's no getloadavg() for windows. At least I tried hard to find one, because I want to add a comparable figure to cygwin core. emacs, coreutils, make and others would need desperately need it, not to speak of servers and real-time apps. Did you read it from taskman, or did you come up with your self-written solution? In taskman there's afaik no comparable figure. But there should be some perfmon api, which would do the trick. Overview: http://www.wilsonmar.com/1perfmon.htm#TaskManager The load average (LA) is the average number of processes (the sum of the run queue length and the number of jobs currently running) that are ready to run, but are waiting for access to a busy CPU. And thanks for the overview! -- Reini Urban http://xarch.tu-graz.ac.at/home/rurban/ ---(end of broadcast)--- TIP 7: don't forget to increase your free space map settings
[PERFORM] HELP speed up my Postgres
Hi ALL, Ive been using postgres for 3 years and now we are having problems with its performance. Here are some givens.. We have 260 subscription tables per Database. We have 2 databases. Our main client has given us 250,000 mobile numbers to deactivate. -- We we are experiencing 91,000 mobile numbers to deactive it took a week to finish for 1 DB only the second DB is still in the process of deactivating Algorithm to deactivate: we loaded all subscription tables names into a table we loaded all mobile numbers to deactivate into a table SQL: update SUBSCRIPTIONTABLE set ACTIVEFLAG='Y' where mobile_num in (select mobile_num from LOADED_MOBILE_NUMBERS) the script was made is C COFIG FILE: # This is ARA nmimain tcpip_socket = true max_connections = 150 superuser_reserved_connections = 2 port = 5433 shared_buffers = 45600 sort_mem = 4 max_locks_per_transaction=128 #fsync = true #wal_sync_method = fsync # # Locale settings # # (initialized by initdb -- may be changed) LC_MESSAGES = 'en_US.UTF-8' LC_MONETARY = 'en_US.UTF-8' LC_NUMERIC = 'en_US.UTF-8' LC_TIME = 'en_US.UTF-8' .. DB is being vaccumed every week my box is running on a DUAL Xeon, 15K RPM with 2 G Mem. that box is running 2 instances of PG DB. TIA, ---(end of broadcast)--- TIP 1: subscribe and unsubscribe commands go to [EMAIL PROTECTED]
Re: [PERFORM] HELP speed up my Postgres
Dear JM , Ive been using postgres for 3 years and now we are having problems with its PostgrSQL version please -- With Best Regards, Vishal Kashyap. Lead Software Developer, http://saihertz.com, http://vishalkashyap.tk ---(end of broadcast)--- TIP 8: explain analyze is your friend
Re: [PERFORM] HELP speed up my Postgres
PG Version 7.3.4 On Thursday 25 November 2004 14:12, Vishal Kashyap @ [SaiHertz] wrote: Dear JM , Ive been using postgres for 3 years and now we are having problems with its PostgrSQL version please ---(end of broadcast)--- TIP 4: Don't 'kill -9' the postmaster
Re: [PERFORM] HELP speed up my Postgres
JM [EMAIL PROTECTED] writes: PG Version 7.3.4 Avoid the IN (subselect) construct then. 7.4 is the first release that can optimize that in any real sense. regards, tom lane ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html
Re: [PERFORM] HELP speed up my Postgres
On Thu, 25 Nov 2004 14:00:32 +0800, JM [EMAIL PROTECTED] wrote: update SUBSCRIPTIONTABLE set ACTIVEFLAG='Y' where mobile_num in (select mobile_num from LOADED_MOBILE_NUMBERS) does loaded_mobile_numbers have a primary key or index on mobile_num? same for subscriptiontable? have you analyzed both tables? is mobile_num the same type in both tables? how does this query compare? update SUBSCRIPTIONTABLE set ACTIVEFLAG='Y' from loaded_mobile_numbers where subscriptiontable.mobile_num = LOADED_MOBILE_NUMBERS.mobile_num klint. +---+-+ : Klint Gore: Non rhyming: : EMail : [EMAIL PROTECTED] : slang - the: : Snail : A.B.R.I.: possibilities : : Mail University of New England : are useless : : Armidale NSW 2351 Australia : L.J.J. : : Fax : +61 2 6772 5376 : : +---+-+ ---(end of broadcast)--- TIP 9: the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] HELP speed up my Postgres
update SUBSCRIPTIONTABLE set ACTIVEFLAG='Y' where mobile_num in (select mobile_num from LOADED_MOBILE_NUMBERS) Change to: update SUBSCRIPTIONTABLE set ACTIVEFLAG='Y' where exists (select 1 from LOADED_MOBILE_NUMBERS lmn where lmn.mobile_num=SUBSCRIPTIONTABLE.mobile_num); That should run a lot faster. Make sure you have indexes on both mobile_num columns. Chris ---(end of broadcast)--- TIP 5: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faqs/FAQ.html