Hi, Bryan.
I was just about to reply to the thread, thanks for asking. Clustering was
key. After rebooting the machine (just to make sure absolutely nothing was
cached), I immediately ran a report on Toronto: 5.25 seconds!
Here's what I did:
1. Created a new set of tables that matched the
2010/5/27 David Jarvis thanga...@gmail.com:
Hi, Bryan.
I was just about to reply to the thread, thanks for asking. Clustering was
key. After rebooting the machine (just to make sure absolutely nothing was
cached), I immediately ran a report on Toronto: 5.25 seconds!
Here's what I did:
Salut, Cédric.
I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...
That is a fairly important restriction. I will try making it *
(year1||'-01-01')::date*, but I have no constant value for it --
2010/5/27 David Jarvis thanga...@gmail.com:
Salut, Cédric.
I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...
That is a fairly important restriction. I will try making it
Agree with Tom on his point about avoidance of cost param adjustments to fit
specific test cases.
A few suggestions...as I assume you own this database...
- check out pg_statio_user_tables - optimize your cache hit ratio on blocks
read...different time durations... pg_stat_bgwriter (read from a
Hi, Bryan.
Thanks for the notes. I thought about using a prepared statement, but I
cannot find any examples of using a PREPARE statement from within a
function, and don't really feel like tinkering around to figure it out.
Performance is at the point where the Java/PHP bridge and JasperReports
Hi, Rob.
I tried bumping the effective_cache_size. It made no difference.
My latest attempt at forcing PostgreSQL to use the indexes involved two
loops: one to loop over the stations, the other to extract the station data
from the measurement table. The outer loop executes in 1.5 seconds. The
On May 26, 2010, at 6:50 AM, David Jarvis wrote:
That said, when using the following condition, the query is fast (1 second):
extract(YEAR FROM sc.taken_start) = 1963 AND
extract(YEAR FROM sc.taken_end) = 2009 AND
- Index Scan using measurement_013_stc_idx
] Random Page Cost and Planner
From: t...@fuzzy.cz
Date: Wed, May 26, 2010 12:01 pm
To: David Jarvis thanga...@gmail.com
Priority: Normal
Options:View Full Header | View Printable Version | Download this as
a file | View Message details
Hi, Tom.
Yes
David Jarvis thanga...@gmail.com wrote:
It sounds as though the active portion of your database is pretty
much cached in RAM. True?
I would not have thought so; there are seven tables, each with 39
to 43 million rows
The machine has 4GB of RAM
In that case, modifying seq_page_cost or
Hi, Alexey.
Is it necessary to get the data as far as 1900 all the time ? Maybe there is
a possibility to aggregate results from the past years if they are constant.
This I have done. I created another table (station_category) that associates
stations with when they started to take
Hi,
And this is what happens in the queries above - the first query covers
years 1963-2009, while the second one covers 1900-2009. Given the fact
this table contains ~40m rows, the first query returns about 0.01% (3k
rows) while the second one returns almost 50% of the data (18m rows). So
I
Hi, Kevin.
below something in the range of 1.5 to 2 is probably not going to be
a good choice for the mix as a whole.
Good to know; thanks.
This should probably be set to something on the order of 3GB. This
will help the optimizer make more intelligent choices about when use
of the index
Hi,
sc.taken_end = '1996-12-31'::date AND
m.taken BETWEEN sc.taken_start AND sc.taken_end AND
category of data at a certain time. But I'm afraid this makes the planning
much more difficult, as the select from measurements depend on the data
returned by other parts of the query (rows from
I was told to try OVERLAPS instead of checking years. The query is now:
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) as amount
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND
Hi,
I changed the date comparison to be based on year alone:
extract(YEAR FROM sc.taken_start) = 1900 AND
extract(YEAR FROM sc.taken_end) = 2009 AND
The indexes are now always used; if someone wants to explain why using the
numbers works (a constant) but using a date (another constant?)
David Jarvis thanga...@gmail.com wrote:
The value for *random_page_cost* was at 2.0; reducing it to 1.1
had a massive performance improvement (nearly an order of
magnitude). While the results now return in 5 seconds (down from
~85 seconds)
It sounds as though the active portion of your
Hi, Kevin.
Thanks for the response.
It sounds as though the active portion of your database is pretty
much cached in RAM. True?
I would not have thought so; there are seven tables, each with 39 to 43
million rows as:
CREATE TABLE climate.measurement (
id bigserial NOT NULL,
taken date
David Jarvis thanga...@gmail.com writes:
It sounds as though the active portion of your database is pretty
much cached in RAM. True?
I would not have thought so; there are seven tables, each with 39 to 43
million rows as: [ perhaps 64 bytes per row ]
The machine has 4GB of RAM, donated to
On Tue, May 25, 2010 at 4:26 PM, David Jarvis thanga...@gmail.com wrote:
shared_buffers = 1GB
temp_buffers = 32MB
work_mem = 32MB
maintenance_work_mem = 64MB
effective_cache_size = 256MB
Shouldn't effective_cache_size be significantly larger?
--
Rob Wultsch
wult...@gmail.com
--
Sent via
Hi, Tom.
Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.
I also tried
Hi,
I wrote a query (see below) that extracts climate data from weather stations
within a given radius of a city using the dates for which those weather
stations actually have data. The query uses the measurement table's only
index:
CREATE UNIQUE INDEX measurement_001_stc_idx
ON
22 matches
Mail list logo