Re: [PERFORM] NOT LIKE much faster than LIKE?
Hi, I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply the pattern test with each common value and derive answers that are exact for the portion of the population represented by the most-common-values list. If the MCV list covers a large fraction of the population then this would be a big leg up in accuracy. Dunno if that applies to your particular case or not, but it seems worth doing ... This reminds me what I did in a patch which is currently on hold for the next release: http://momjian.postgresql.org/cgi-bin/pgpatches_hold http://candle.pha.pa.us/mhonarc/patches_hold/msg00026.html The patch was addressing a similar issue when using ltree @ and @ operator on an unbalanced tree. Best regards -- Matteo Beccati http://phpadsnew.com http://phppgads.com ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] help tuning queries on large database
At 12:23 PM 1/9/2006, peter royal wrote: On Jan 8, 2006, at 4:35 PM, Ron wrote: Areca ARC-1220 8-port PCI-E controller Make sure you have 1GB or 2GB of cache. Get the battery backup and set the cache for write back rather than write through. The card we've got doesn't have a SODIMM socket, since its only an 8- port card. My understanding was that was cache used when writing? Trade in your 8 port ARC-1220 that doesn't support 1-2GB of cache for a 12, 16, or 24 port Areca one that does. It's that important. Present generation SATA2 HDs should average ~50MBps raw ASTR. The Intel IOP333 DSP on the ARC's is limited to 800MBps, so that's your limit per card. That's 16 SATA2 HD's operating in parallel (16HD RAID 0, 17 HD RAID 5, 32 HD RAID 10). Next generation 2.5 form factor 10Krpm SAS HD's due to retail in 2006 are supposed to average ~90MBps raw ASTR. 8 such HDs in parallel per ARC-12xx will be the limit. Side Note: the PCI-Ex8 bus on the 12xx cards is good for ~1.6GBps RWPB, so I expect Areca is going to be upgrading this controller to at least 2x, if not 4x (would require replacing the x8 bus with a x16 bus), the bandwidth at some point. A PCI-Ex16 bus is good for ~3.2GBps RWPB, so if you have the slots 4 such populated ARC cards will max out a PCI-Ex16 bus. In your shoes, I think I would recommend replacing your 8 port ARC-1220 with a 12 port ARC-1230 with 1-2GB of battery backed cache and planning to get more of them as need arises. A 2.6.12 or later based Linux distro should have NO problems using more than 4GB or RAM. Upgraded the kernel to 2.6.15, then we were able to set the BIOS option for the 'Memory Hole' to 'Software' and it saw all 4G (under 2.6.11 we got a kernel panic with that set) There are some other kernel tuning params that should help memory and physical IO performance. Talk to a Linux kernel guru to get the correct advice specific to your installation and application. It should be noted that there are indications of some major inefficiencies in pg's IO layer that make it compute bound under some circumstances before it becomes IO bound. These may or may not cause trouble for you as you keep pushing the envelope for maximum IO performance. With the kind of work you are doing and we are describing, I'm sure you can have a _very_ zippy system. Ron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] How to handle a large DB and simultaneous accesses?
Hello, I have to develop a companies search engine (looks like the Yellow pages). We're using PostgreSQL at the company, and the initial DB is 2GB large, as it has companies from the entire world, with a fair amount of information. What reading do you suggest so that we can develop the search engine core, in order that the result pages show up instantly, no matter the heavy load and the DB size. The DB is 2GB but should grow to up to 10GB in 2 years, and there should be 250,000 unique visitors per month by the end of the year. Are there special techniques? Maybe there's a way to sort of cache search results? We're using PHP5 + phpAccelerator. Thanks, -- Charles A. Landemaine. ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, Jan 10, 2006 at 10:11:18AM -0500, Greg Stark wrote: Andrea Arcangeli [EMAIL PROTECTED] writes: Fixing this with proper stats would be great indeed. What would be the most common value for the kernel_version? You can see samples of the kernel_version here http://klive.cpushare.com/2.6.15/ . That's the string that is being searched against both PREEMPT and SMP. Try something like this where attname is the column name and tablename is, well, the tablename: db= select most_common_vals from pg_stats where tablename = 'region' and attname = 'province'; most_common_vals -- {ON,NB,QC,BC} Thanks for the info! klive= select most_common_vals from pg_stats where tablename = 'klive' and attname = 'kernel_version'; most_common_vals {#1 Tue Sep 13 14:56:15 UTC 2005,#1 Fri Aug 19 11:58:59 UTC 2005,#7 SMP Fri Oct 7 15:56:41 CEST 2005,#1 SMP Fri Aug 19 11:58:59 UTC 2005,#2 Thu Sep 22 15:58:44 CEST 2005,#1 Fri Sep 23 15:32:21 GMT 2005,#1 Fri Oct 21 03:46:55 EDT 2005,#1 Sun Sep 4 13:45:32 CEST 2005,#5 PREEMPT Mon Nov 21 17:53:59 EET 2005,#1 Wed Sep 28 19:15:10 EDT 2005} (1 row) klive= select most_common_freqs from pg_stats where tablename = 'klive' and attname = 'kernel_version'; most_common_freqs --- {0.013,0.0116667,0.011,0.009,0.0073,0.0067,0.0063,0.006,0.006,0.0057} (1 row) klive= There's only one preempt near the end, not sure if it would work? ---(end of broadcast)--- TIP 5: don't forget to increase your free space map settings
Re: [PERFORM] 500x speed-down: Wrong statistics!
Alessandro Baretta [EMAIL PROTECTED] writes: I have no clue as to how or why the statistics were wrong yesterday--as I vacuum-analyzed continuously out of lack of any better idea--and I was stupid enough to re-timestamp everything before selecting from pg_stats. Too bad. I would be interested to find out how, if the stats were up-to-date, the thing was still getting the row estimate so wrong. If you manage to get the database back into its prior state please do send along the pg_stats info. regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index isn't used during a join.
When grilled further on (Mon, 9 Jan 2006 22:58:18 -0700), Michael Fuhr [EMAIL PROTECTED] confessed: On Mon, Jan 09, 2006 at 09:23:38PM -0700, Robert Creager wrote: I'm working with a query to get more info out with a join. The base query works great speed wise because of index usage. When the join is tossed in, the index is no longer used, so the query performance tanks. The first query you posted returns 285 rows and the second returns over one million; index usage aside, that difference surely accounts for a performance penalty. And as is often pointed out, index scans aren't always faster than sequential scans: the more of a table a query has to fetch, the more likely a sequential scan will be faster. Thanks for pointing out the obvious that I missed. Too much data in the second query. It's supposed to match (row wise) what was returned from the first query. Just ignore me for now... Thanks, Rob -- 08:15:24 up 3 days, 42 min, 9 users, load average: 2.07, 2.20, 2.25 Linux 2.6.12-12-2 #4 SMP Tue Jan 3 19:56:19 MST 2006 pgpQ6nKK3glcj.pgp Description: PGP signature
Re: [PERFORM] NOT LIKE much faster than LIKE?
Andrea Arcangeli [EMAIL PROTECTED] writes: Fixing this with proper stats would be great indeed. What would be the most common value for the kernel_version? You can see samples of the kernel_version here http://klive.cpushare.com/2.6.15/ . That's the string that is being searched against both PREEMPT and SMP. Try something like this where attname is the column name and tablename is, well, the tablename: db= select most_common_vals from pg_stats where tablename = 'region' and attname = 'province'; most_common_vals -- {ON,NB,QC,BC} Note that there's a second column most_common_freqs and to do this would really require doing a weighted average based on the frequencies. -- greg ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] NOT LIKE much faster than LIKE?
Andrea Arcangeli [EMAIL PROTECTED] writes: There's only one preempt near the end, not sure if it would work? Not with that data, but maybe if you increased the statistics target for the column to 100 or so, you'd catch enough values to get reasonable results. regards, tom lane ---(end of broadcast)--- TIP 2: Don't 'kill -9' the postmaster
Re: [PERFORM] NOT LIKE much faster than LIKE?
Matteo Beccati [EMAIL PROTECTED] writes: I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply the pattern test with each common value and derive answers that are exact for the portion of the population represented by the most-common-values list. This reminds me what I did in a patch which is currently on hold for the next release: I've applied a patch to make patternsel() compute the exact result for the MCV list, and use its former heuristics only for the portion of the column population not included in the MCV list. After finishing that work it occurred to me that we could go a step further: if the MCV list accounts for a substantial fraction of the population, we could assume that the MCV list is representative of the whole population, and extrapolate the pattern's selectivity over the MCV list to the whole population instead of using the existing heuristics at all. In a situation like Andreas' example this would win big, although you can certainly imagine cases where it would lose too. Any thoughts about this? What would be a reasonable threshold for substantial fraction? It would probably make sense to have different thresholds depending on whether the pattern is left-anchored or not, since the range heuristic only works for left-anchored patterns. regards, tom lane ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, 2006-01-10 at 12:49 -0500, Tom Lane wrote: Matteo Beccati [EMAIL PROTECTED] writes: I did just think of something we could improve though. The pattern selectivity code doesn't make any use of the statistics about most common values. For a constant pattern, we could actually apply the pattern test with each common value and derive answers that are exact for the portion of the population represented by the most-common-values list. This reminds me what I did in a patch which is currently on hold for the next release: I've applied a patch to make patternsel() compute the exact result for the MCV list, and use its former heuristics only for the portion of the column population not included in the MCV list. I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. That doesn't match most languages where one and two syllable words are extremely common and longer ones much less so. A syllable can be 1-2 chars long, so any search string of length 1-4 is probably equally likely, rather than reducing in selectivity based upon length. So I think part of the problem is the geometrically reducing selectivity itself. After finishing that work it occurred to me that we could go a step further: if the MCV list accounts for a substantial fraction of the population, we could assume that the MCV list is representative of the whole population, and extrapolate the pattern's selectivity over the MCV list to the whole population instead of using the existing heuristics at all. In a situation like Andreas' example this would win big, although you can certainly imagine cases where it would lose too. I don't think that can be inferred with any confidence, unless a large proportion of the MCV list were itself selected. Otherwise it might match only a single MCV that just happens to have a high proportion, then we assume all others have the same proportion. The calculation is related to Ndistinct, in some ways. Any thoughts about this? What would be a reasonable threshold for substantial fraction? It would probably make sense to have different thresholds depending on whether the pattern is left-anchored or not, since the range heuristic only works for left-anchored patterns. I don't think you can do this for a low enough substantial fraction to make this interesting. I would favour the idea of dynamic sampling using a block sampling approach; that was a natural extension of improving ANALYZE also. We can use that approach for things such as LIKE, but also use it for multi-column single-table and join selectivity. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] NOT LIKE much faster than LIKE?
Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a specific proposal? After finishing that work it occurred to me that we could go a step further: if the MCV list accounts for a substantial fraction of the population, we could assume that the MCV list is representative of the whole population, and extrapolate the pattern's selectivity over the MCV list to the whole population instead of using the existing heuristics at all. In a situation like Andreas' example this would win big, although you can certainly imagine cases where it would lose too. I don't think that can be inferred with any confidence, unless a large proportion of the MCV list were itself selected. Otherwise it might match only a single MCV that just happens to have a high proportion, then we assume all others have the same proportion. Well, of course it can't be inferred with confidence. Sometimes you'll win and sometimes you'll lose. The question is, is this a better heuristic than what we use otherwise? The current estimate for non-anchored patterns is really pretty crummy, and even with a less aggressive length-vs-selectivity curve it's not going to be great. Another possibility is to merge the two estimates somehow. I would favour the idea of dynamic sampling using a block sampling approach; that was a natural extension of improving ANALYZE also. One thing at a time please. Obtaining better statistics is one issue, but the one at hand here is what to do given particular statistics. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] NOT LIKE much faster than LIKE?
On Tue, 2006-01-10 at 17:21 -0500, Tom Lane wrote: Simon Riggs [EMAIL PROTECTED] writes: I think its OK to use the MCV, but I have a problem with the current heuristics: they only work for randomly generated strings, since the selectivity goes down geometrically with length. We could certainly use a less aggressive curve for that. You got a specific proposal? I read some research not too long ago that showed a frequency curve of words by syllable length. I'll dig that out tomorrow. I would favour the idea of dynamic sampling using a block sampling approach; that was a natural extension of improving ANALYZE also. One thing at a time please. Obtaining better statistics is one issue, but the one at hand here is what to do given particular statistics. I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity. I specifically didn't mention that in the Ndistinct discussion because I didn't want to confuse the subject further, but the underlying block sampling method would be identical, so the code is already almost there...we just need to eval the RestrictInfo against the sampled tuples. Best Regards, Simon Riggs ---(end of broadcast)--- TIP 6: explain analyze is your friend
Re: [PERFORM] help tuning queries on large database
Ron, A few days back you mentioned: Upgrade your kernel to at least 2.6.12 There's a known issue with earlier versions of the 2.6.x kernel and 64b CPUs like the Opteron. See kernel.org for details. I did some searching and couldn't find any obvious mention of this issue (I gave up after searching through the first few hundred instances of 64 in the 2.6.12 changelog). Would you mind being a little more specific about which issue you're talking about? We're about to deploy some new 16GB RAM Opteron DB servers and I'd like to check and make sure RH backported whatever the fix was to their current RHEL4 kernel. Thanks, Mark Lewis ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
[PERFORM] Left Join Performance vs Inner Join Performance
Hello, I have an inner join query that runs fast, but I when I change to a left join the query runs 96 times slower. I wish I could always do an inner join, but there are rare times when there isnt data in the right hand table. I could expect a small performance hit, but the difference is so large I figure I must be doing something wrong. What I think is the strangest is how similar the two query plans are. Query (inner join version, just replace inner with left for other version): select p.owner_trader_id, p.strategy_id, m.last, m.bid, m.ask from om_position p inner join om_instrument_mark m on m.instrument_id = p.instrument_id and m.data_source_id = 5 and m.date = '2005-02-03' where p.as_of_date = '2005-02-03' and p.fund_id = 'TRIDE' and p.owner_trader_id = 'tam4' and p.strategy_id = 'BASKET1' Query plan for inner join: Nested Loop (cost=0.00..176.99 rows=4 width=43) (actual time=0.234..14.182 rows=193 loops=1) - Index Scan using as_of_date_om_position_index on om_position p (cost=0.00..68.26 rows=19 width=20) (actual time=0.171..5.210 rows=193 loops=1) Index Cond: (as_of_date = '2005-02-03'::date) Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text = 'tam4'::text) AND ((strategy_id)::text = 'BASKET1'::text)) - Index Scan using om_instrument_mark_pkey on om_instrument_mark m (cost=0.00..5.71 rows=1 width=31) (actual time=0.028..0.032 rows=1 loops=193) Index Cond: ((m.instrument_id = outer.instrument_id) AND (m.data_source_id = 5) AND (m.date = '2005-02-03'::date)) Total runtime: 14.890 ms Query plan for left join: Nested Loop Left Join (cost=0.00..7763.36 rows=19 width=43) (actual time=3.005..1346.308 rows=193 loops=1) - Index Scan using as_of_date_om_position_index on om_position p (cost=0.00..68.26 rows=19 width=20) (actual time=0.064..6.654 rows=193 loops=1) Index Cond: (as_of_date = '2005-02-03'::date) Filter: (((fund_id)::text = 'TRIDE'::text) AND ((owner_trader_id)::text = 'tam4'::text) AND ((strategy_id)::text = 'BASKET1'::text)) - Index Scan using om_instrument_mark_pkey on om_instrument_mark m (cost=0.00..404.99 rows=1 width=31) (actual time=3.589..6.919 rows=1 loops=193) Index Cond: (m.instrument_id = outer.instrument_id) Filter: ((data_source_id = 5) AND (date = '2005-02-03'::date)) Total runtime: 1347.159 ms Table Definitions: CREATE TABLE om_position ( fund_id varchar(10) NOT NULL DEFAULT ''::character varying, owner_trader_id varchar(10) NOT NULL DEFAULT ''::character varying, strategy_id varchar(30) NOT NULL DEFAULT ''::character varying, instrument_id int4 NOT NULL DEFAULT 0, as_of_date date NOT NULL DEFAULT '0001-01-01'::date, pos numeric(22,9) NOT NULL DEFAULT 0.0, cf_account_id int4 NOT NULL DEFAULT 0, cost numeric(22,9) NOT NULL DEFAULT 0.0, CONSTRAINT om_position_pkey PRIMARY KEY (fund_id, owner_trader_id, strategy_id, cf_account_id, instrument_id, as_of_date), CONSTRAINT $1 FOREIGN KEY (strategy_id) REFERENCES om_strategy (strategy_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $2 FOREIGN KEY (fund_id) REFERENCES om_fund (fund_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $3 FOREIGN KEY (cf_account_id) REFERENCES om_cf_account (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $4 FOREIGN KEY (owner_trader_id) REFERENCES om_trader (trader_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; CREATE INDEX as_of_date_om_position_index ON om_position USING btree (as_of_date); CREATE TABLE om_instrument_mark ( instrument_id int4 NOT NULL DEFAULT 0, data_source_id int4 NOT NULL DEFAULT 0, date date NOT NULL DEFAULT '0001-01-01'::date, last numeric(22,9) NOT NULL DEFAULT 0.0, bid numeric(22,9) NOT NULL DEFAULT 0.0, ask numeric(22,9) NOT NULL DEFAULT 0.0, comment varchar(150) NOT NULL DEFAULT ''::character varying, trader_id varchar(10) NOT NULL DEFAULT 'auto'::character varying, CONSTRAINT om_instrument_mark_pkey PRIMARY KEY (instrument_id, data_source_id, date), CONSTRAINT $1 FOREIGN KEY (instrument_id) REFERENCES om_instrument (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT $2 FOREIGN KEY (data_source_id) REFERENCES om_data_source (id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION, CONSTRAINT om_instrument_mark_trader_id_fkey FOREIGN KEY (trader_id) REFERENCES om_trader (trader_id) MATCH SIMPLE ON UPDATE NO ACTION ON DELETE NO ACTION ) WITH OIDS; Thanks for any help
Re: [PERFORM] NOT LIKE much faster than LIKE?
Simon Riggs [EMAIL PROTECTED] writes: I meant use the same sampling approach as I was proposing for ANALYZE, but do this at plan time for the query. That way we can apply the function directly to the sampled rows and estimate selectivity. I think this is so unlikely to be a win as to not even be worth spending any time discussing. The extra planning time across all queries will vastly outweigh the occasional improvement in plan choice for some queries. regards, tom lane ---(end of broadcast)--- TIP 9: In versions below 8.0, the planner will ignore your desire to choose an index scan if your joining column's datatypes do not match
Re: [PERFORM] How to handle a large DB and simultaneous accesses?
On Tue, 10 Jan 2006, Charles A. Landemaine wrote: Hello, I have to develop a companies search engine (looks like the Yellow pages). We're using PostgreSQL at the company, and the initial DB is 2GB large, as it has companies from the entire world, with a fair amount of information. What reading do you suggest so that we can develop the search engine core, in order that the result pages show up instantly, no matter the heavy load and the DB size. The DB is 2GB but should grow to up to 10GB in 2 years, and there should be 250,000 unique visitors per month by the end of the year. Are there special techniques? Maybe there's a way to sort of cache search results? We're using PHP5 + phpAccelerator. Thanks, frankly that is a small enough chunk of data compared to available memory sizes that I think your best bet is to plan to have enough ram that you only do disk I/O to write and on boot. a dual socket Opteron system can hold 16G with 2G memory modules (32G as 4G modules become readily available over the next couple of years). this should be enough to keep your data and indexes in ram at all times. if you find that other system processes push the data out of ram consider loading the data from disk to a ramfs filesystem, just make sure you don't update the ram-only copy (or if you do that you have replication setup to replicate from the ram copy to a copy on real disks somewhere). depending on your load you could go with single core or dual core chips (and the cpu's are a small enough cost compared to this much ram that you may as well go with the dual core cpu's) now even with your data in ram you can slow down if your queries, indexes, and other settings are wrong, but if performance is important you should be able to essentially eliminate disks for databases of this size. David Lang ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq
Re: [PERFORM] Left Join Performance vs Inner Join Performance
Dave Dutcher [EMAIL PROTECTED] writes: I have an inner join query that runs fast, but I when I change to a left join the query runs 96 times slower. This looks like an issue that is fixed in the latest set of releases, namely that OUTER JOIN ON conditions that reference only the inner side of the join weren't getting pushed down into indexquals. See thread here: http://archives.postgresql.org/pgsql-performance/2005-12/msg00134.php and patches in this and the following messages: http://archives.postgresql.org/pgsql-committers/2005-12/msg00105.php regards, tom lane ---(end of broadcast)--- TIP 4: Have you searched our list archives? http://archives.postgresql.org
Re: [PERFORM] Index isn't used during a join.
Ok, I'm back, and in a little better shape. The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Taken individually: weather=# explain analyze select * from doy_agg where doy = extract( doy from now() ); QUERY PLAN HashAggregate (cost=13750.67..13750.71 rows=2 width=20) (actual time=123.134..123.135 rows=1 loops=1) - Bitmap Heap Scan on readings (cost=25.87..13720.96 rows=3962 width=20) (actual time=6.384..116.559 rows=4175 loops=1) Recheck Cond: (date_part('doy'::text, when) = date_part('doy'::text, now())) - Bitmap Index Scan on readings_doy_index (cost=0.00..25.87 rows=3962 width=0) (actual time=5.282..5.282 rows=4215 loops=1) Index Cond: (date_part('doy'::text, when) = date_part('doy'::text, now())) Total runtime: 123.366 ms produces the data: weather=# select * from doy_agg where doy = extract( doy from now() ); doy | avg_windspeed | max_windspeed -+--+--- 10 | 8.53403056583666 |59 and: weather=# EXPLAIN ANALYZE weather-# SELECT *, weather-# unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=595.33..595.77 rows=176 width=28) (actual time=4.762..4.828 rows=283 loops=1) Sort Key: time_group - Bitmap Heap Scan on windspeed (cost=2.62..588.76 rows=176 width=28) (actual time=0.901..3.834 rows=283 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.745..0.745 rows=284 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) Total runtime: 5.108 ms produces: time_group |min_reading| max_reading |avg_reading|time +---+-+---+- 1136869500 | 0.8 | 6 | 2.62193548387097 | 2006-01-09 22:05:00 1136869800 | 0 | 3 | 0.406021505376343 | 2006-01-09 22:10:00 1136870100 | 0 | 5 | 1.68 | 2006-01-09 22:15:00 ... But I want the composite of the two queries, and I'm stuck on: weather=# EXPLAIN ANALYZE weather-# SELECT *, weather-# unmunge_time( time_group ) AS time weather-# FROM minute.windspeed weather-# JOIN doy_agg ON( EXTRACT( doy FROM unmunge_time( time_group ) ) = doy ) weather-# WHERE unmunge_time( time_group ) ( now() - '24 hour'::interval ) weather-# ORDER BY time_group; QUERY PLAN - Sort (cost=153627.67..153628.48 rows=322 width=48) (actual time=10637.681..10637.748 rows=286 loops=1) Sort Key: windspeed.time_group - Merge Join (cost=153604.82..153614.26 rows=322 width=48) (actual time=10633.375..10636.728 rows=286 loops=1) Merge Cond: (outer.?column5? = inner.doy) - Sort (cost=594.89..595.33 rows=176 width=28) (actual time=5.539..5.612 rows=286 loops=1) Sort Key: date_part('doy'::text, unmunge_time(windspeed.time_group)) - Bitmap Heap Scan on windspeed (cost=2.62..588.32 rows=176 width=28) (actual time=0.918..4.637 rows=286 loops=1) Recheck Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Bitmap Index Scan on minute_windspeed_unmunge_index (cost=0.00..2.62 rows=176 width=0) (actual time=0.739..0.739 rows=287 loops=1) Index Cond: (unmunge_time(time_group) (now() - '24:00:00'::interval)) - Sort (cost=153009.93..153010.84 rows=366 width=20) (actual time=10627.699..10627.788 rows=295 loops=1) Sort Key: doy_agg.doy - HashAggregate (cost=152984.28..152990.69 rows=366 width=20) (actual time=10625.649..10626.601 rows=366 loops=1) - Seq Scan on readings (cost=0.00..145364.93 rows=1015914 width=20) (actual
Re: [PERFORM] help tuning queries on large database
At 07:28 PM 1/10/2006, Mark Lewis wrote: Ron, A few days back you mentioned: Upgrade your kernel to at least 2.6.12 There's a known issue with earlier versions of the 2.6.x kernel and 64b CPUs like the Opteron. See kernel.org for details. I did some searching and couldn't find any obvious mention of this issue (I gave up after searching through the first few hundred instances of 64 in the 2.6.12 changelog). Would you mind being a little more specific about which issue you're talking about? We're about to deploy some new 16GB RAM Opteron DB servers and I'd like to check and make sure RH backported whatever the fix was to their current RHEL4 kernel. There are 3 issues I know about in general: 1= As Peter Royal noted on this list, pre 12 versions of 2.6.x have problems with RAM of = 4GB. 2= Pre 12 versions on 2.6.x when running A64 or Xeon 64b SMP seem to be susceptible to context switch storms. 3= Physical and memory IO is considerably improved in the later versions of 2.6.x compared to 2.6.11 or earlier. Talk to a real Linux kernel guru (I am not) for details and specifics. Ron ---(end of broadcast)--- TIP 1: if posting/reading through Usenet, please send an appropriate subscribe-nomail command to [EMAIL PROTECTED] so that your message can get through to the mailing list cleanly
Re: [PERFORM] Index isn't used during a join.
On Tue, Jan 10, 2006 at 10:10:55PM -0700, Robert Creager wrote: The query is now correct, but still is slow because of lack of index usage. I don't know how to structure the query correctly to use the index. Have you tried adding restrictions on doy in the WHERE clause? Something like this, I think: WHERE ... AND doy = EXTRACT(doy FROM now() - '24 hour'::interval) AND doy = EXTRACT(doy FROM now()) Something else occurred to me: do you (or will you) have more than one year of data? If so then matching on doy could be problematic unless you also check for the year, or unless you want to match more than one year. -- Michael Fuhr ---(end of broadcast)--- TIP 3: Have you checked our extensive FAQ? http://www.postgresql.org/docs/faq