Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis
Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did: 1. Created a new set of tables that matched the

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread Cédric Villemain
2010/5/27 David Jarvis thanga...@gmail.com: Hi, Bryan. I was just about to reply to the thread, thanks for asking. Clustering was key. After rebooting the machine (just to make sure absolutely nothing was cached), I immediately ran a report on Toronto: 5.25 seconds! Here's what I did:

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis
Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it * (year1||'-01-01')::date*, but I have no constant value for it --

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread Cédric Villemain
2010/5/27 David Jarvis thanga...@gmail.com: Salut, Cédric. I wonder what the plan will be if you replace sc.taken_* in : m.taken BETWEEN sc.taken_start AND sc.taken_end by values. It might help the planner... That is a fairly important restriction. I will try making it

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread Bryan Hinton
Agree with Tom on his point about avoidance of cost param adjustments to fit specific test cases. A few suggestions...as I assume you own this database... - check out pg_statio_user_tables - optimize your cache hit ratio on blocks read...different time durations... pg_stat_bgwriter (read from a

Re: [PERFORM] Random Page Cost and Planner

2010-05-27 Thread David Jarvis
Hi, Bryan. Thanks for the notes. I thought about using a prepared statement, but I cannot find any examples of using a PREPARE statement from within a function, and don't really feel like tinkering around to figure it out. Performance is at the point where the Java/PHP bridge and JasperReports

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis
Hi, Rob. I tried bumping the effective_cache_size. It made no difference. My latest attempt at forcing PostgreSQL to use the indexes involved two loops: one to loop over the stations, the other to extract the station data from the measurement table. The outer loop executes in 1.5 seconds. The

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread Alexey Klyukin
On May 26, 2010, at 6:50 AM, David Jarvis wrote: That said, when using the following condition, the query is fast (1 second): extract(YEAR FROM sc.taken_start) = 1963 AND extract(YEAR FROM sc.taken_end) = 2009 AND - Index Scan using measurement_013_stc_idx

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread tv
] Random Page Cost and Planner From: t...@fuzzy.cz Date: Wed, May 26, 2010 12:01 pm To: David Jarvis thanga...@gmail.com Priority: Normal Options:View Full Header | View Printable Version | Download this as a file | View Message details Hi, Tom. Yes

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread Kevin Grittner
David Jarvis thanga...@gmail.com wrote: It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows The machine has 4GB of RAM In that case, modifying seq_page_cost or

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis
Hi, Alexey. Is it necessary to get the data as far as 1900 all the time ? Maybe there is a possibility to aggregate results from the past years if they are constant. This I have done. I created another table (station_category) that associates stations with when they started to take

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread tv
Hi, And this is what happens in the queries above - the first query covers years 1963-2009, while the second one covers 1900-2009. Given the fact this table contains ~40m rows, the first query returns about 0.01% (3k rows) while the second one returns almost 50% of the data (18m rows). So I

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis
Hi, Kevin. below something in the range of 1.5 to 2 is probably not going to be a good choice for the mix as a whole. Good to know; thanks. This should probably be set to something on the order of 3GB. This will help the optimizer make more intelligent choices about when use of the index

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis
Hi, sc.taken_end = '1996-12-31'::date AND m.taken BETWEEN sc.taken_start AND sc.taken_end AND category of data at a certain time. But I'm afraid this makes the planning much more difficult, as the select from measurements depend on the data returned by other parts of the query (rows from

Re: [PERFORM] Random Page Cost and Planner

2010-05-26 Thread David Jarvis
I was told to try OVERLAPS instead of checking years. The query is now: SELECT extract(YEAR FROM m.taken) AS year, avg(m.amount) as amount FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5148 AND

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, I changed the date comparison to be based on year alone: extract(YEAR FROM sc.taken_start) = 1900 AND extract(YEAR FROM sc.taken_end) = 2009 AND The indexes are now always used; if someone wants to explain why using the numbers works (a constant) but using a date (another constant?)

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Kevin Grittner
David Jarvis thanga...@gmail.com wrote: The value for *random_page_cost* was at 2.0; reducing it to 1.1 had a massive performance improvement (nearly an order of magnitude). While the results now return in 5 seconds (down from ~85 seconds) It sounds as though the active portion of your

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, Kevin. Thanks for the response. It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows as: CREATE TABLE climate.measurement ( id bigserial NOT NULL, taken date

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Tom Lane
David Jarvis thanga...@gmail.com writes: It sounds as though the active portion of your database is pretty much cached in RAM. True? I would not have thought so; there are seven tables, each with 39 to 43 million rows as: [ perhaps 64 bytes per row ] The machine has 4GB of RAM, donated to

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread Rob Wultsch
On Tue, May 25, 2010 at 4:26 PM, David Jarvis thanga...@gmail.com wrote: shared_buffers = 1GB temp_buffers = 32MB work_mem = 32MB maintenance_work_mem = 64MB effective_cache_size = 256MB Shouldn't effective_cache_size be significantly larger? -- Rob Wultsch wult...@gmail.com -- Sent via

Re: [PERFORM] Random Page Cost and Planner

2010-05-25 Thread David Jarvis
Hi, Tom. Yes, that is what happened, making the tests rather meaningless, and giving me the false impression that the indexes were being used. They were but only because of cached results. When multiple users making different queries, the performance will return to ~80s per query. I also tried

[PERFORM] Random Page Cost and Planner

2010-05-24 Thread David Jarvis
Hi, I wrote a query (see below) that extracts climate data from weather stations within a given radius of a city using the dates for which those weather stations actually have data. The query uses the measurement table's only index: CREATE UNIQUE INDEX measurement_001_stc_idx ON