Fair enough. How about something like make_timestamp? It's at least
shorter and easier than construct :-)
Agreed.
Dave
Does it make sense to allow minutes when hours isn't specified? Or
For time, 00 seems a reasonable default for all values; clearly document the
defaults. Also, having a default makes the code simpler than level n plus
level n-1. (Not to mention explaining it.) ;-)
SELECT to_timestamp( minutes
Hi,
We had a little chat about this with Magnus. It's pretty surprising that
there's no built-in function to do this, we should consider adding one.
I agree; you should be able to create a timestamp or a date from integer
values. Others, apparently, have written code. The implementation I did
Hi,
It's not immediately obvious what the default value of timezone
will be?
The system's locale, like now(); documentation can clarify.
By named parameter, I meant default value. You could construct a timestamp
variable using:
construct_timestamp( year := 1900, hour := 1 )
When I read
Hi,
To avoid string concatenation using dates, I figured I could write a C
function:
#include postgres.h
#include fmgr.h
#include utils/date.h
#include utils/nabstime.h
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
Datum dateserial (PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1 (dateserial);
Datum
Hi, Tom.
extract(YEAR FROM m.taken)
I thought that returned a double precision?
Dave
Hi,
I added an explicit cast in the SQL:
dateserial(extract(YEAR FROM
m.taken)::int,'||p_month1||','||p_day1||') d1,
dateserial(extract(YEAR FROM
m.taken)::int,'||p_month2||','||p_day2||') d2
The function now takes three integer parameters; there was no performance
loss.
Thank
Hi,
I found a slow part of the query:
SELECT
* date(extract(YEAR FROM m.taken)||'-1-1') d1,*
* date(extract(YEAR FROM m.taken)||'-1-31') d2*
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND ...
Date extraction is 3.2
Hi, Andy.
I assume you are doing this in a loop? Many Many Many times? cuz:
Yes. Here are the variations I have benchmarked (times are best of three):
Variation #0
-no date field-
Explain: http://explain.depesz.com/s/Y9R
Time: 2.2s
Variation #1
date('1960-1-1')
Explain:
Sorry, Alvaro.
I was contemplating using a GIN or GiST index as a way of optimizing the
query.
Instead, I found that re-inserting the data in order of station ID (the
primary look-up column) and then CLUSTER'ing on the station ID, taken date,
and category index increased the speed by an order of
Hi,
Hmm, that's nice, though I cannot but wonder whether the exclusive lock
required by CLUSTER is going to be a problem in the long run.
Not an issue; the inserts are one-time (or very rare; at most: once a year).
Hm, keep in mind that if the station clause alone is not selective
enough,
Hi, Bryan.
I was just about to reply to the thread, thanks for asking. Clustering was
key. After rebooting the machine (just to make sure absolutely nothing was
cached), I immediately ran a report on Toronto: 5.25 seconds!
Here's what I did:
1. Created a new set of tables that matched the
Salut, Cédric.
I wonder what the plan will be if you replace sc.taken_* in :
m.taken BETWEEN sc.taken_start AND sc.taken_end
by values. It might help the planner...
That is a fairly important restriction. I will try making it *
(year1||'-01-01')::date*, but I have no constant value for it --
Hi, Bryan.
Thanks for the notes. I thought about using a prepared statement, but I
cannot find any examples of using a PREPARE statement from within a
function, and don't really feel like tinkering around to figure it out.
Performance is at the point where the Java/PHP bridge and JasperReports
Hi, Rob.
I tried bumping the effective_cache_size. It made no difference.
My latest attempt at forcing PostgreSQL to use the indexes involved two
loops: one to loop over the stations, the other to extract the station data
from the measurement table. The outer loop executes in 1.5 seconds. The
Hi, Alexey.
Is it necessary to get the data as far as 1900 all the time ? Maybe there is
a possibility to aggregate results from the past years if they are constant.
This I have done. I created another table (station_category) that associates
stations with when they started to take
Hi, Kevin.
below something in the range of 1.5 to 2 is probably not going to be
a good choice for the mix as a whole.
Good to know; thanks.
This should probably be set to something on the order of 3GB. This
will help the optimizer make more intelligent choices about when use
of the index
Hi,
sc.taken_end = '1996-12-31'::date AND
m.taken BETWEEN sc.taken_start AND sc.taken_end AND
category of data at a certain time. But I'm afraid this makes the planning
much more difficult, as the select from measurements depend on the data
returned by other parts of the query (rows from
I was told to try OVERLAPS instead of checking years. The query is now:
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) as amount
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND
Hi,
I changed the date comparison to be based on year alone:
extract(YEAR FROM sc.taken_start) = 1900 AND
extract(YEAR FROM sc.taken_end) = 2009 AND
The indexes are now always used; if someone wants to explain why using the
numbers works (a constant) but using a date (another constant?)
Hi, Kevin.
Thanks for the response.
It sounds as though the active portion of your database is pretty
much cached in RAM. True?
I would not have thought so; there are seven tables, each with 39 to 43
million rows as:
CREATE TABLE climate.measurement (
id bigserial NOT NULL,
taken date
Hi, Tom.
Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.
I also tried
Hi,
I wrote a query (see below) that extracts climate data from weather stations
within a given radius of a city using the dates for which those weather
stations actually have data. The query uses the measurement table's only
index:
CREATE UNIQUE INDEX measurement_001_stc_idx
ON
Hi,
The problem is now solved (in theory).
Well, it's not the functions per se that's the problem, it's the lack of
a useful index on the expression.
The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were
Hi,
CREATE INDEX measurement_01_001_y_idx
ON climate.measurement_01_001
USING btree
(date_part('year'::text, taken));
Is that equivalent to what you suggest?
No. It is not the same function, so Postgres has no way to know it produces
the same results (if it does).
This is what I
Hi,
certainly understand that you wouldn't want to partition by year. It
Definitely not.
does strike me that perhaps you could partition by day ranges, but you'd
I don't think that will work; users can choose any day range, with the most
common as Jan 1 - Dec 31, followed by seasonal
Hi, Yeb.
This is starting to go back to the design I used with MySQL:
- YEAR_REF - Has year and station
- MONTH_REF - Has month, category, and yea referencer
- MEASUREMENT - Has month reference, amount, and day
Normalizing by date parts was fast. Partitioning the tables by year won't
Hi,
I have posted an image of the user inputs here:
http://i.imgur.com/MUkuZ.png
The problem is that I am given a range of days (Dec 22 - Mar 22) over a
range of years (1900 - 2009) and the range of days can span from one year to
the next. This is not the same as saying Dec 22, 1900 to Mar 22,
When using MySQL, the performance was okay (~5 seconds per query) using:
date( concat_ws( '-', y.year, m.month, d.day ) ) between
-- Start date.
date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND
-- End date. Calculated by checking to see if the end date wraps
-- into
What if I were to have the application pass in two sets of date ranges?
For the condition of Dec 22 to Mar 22:
Dec 22 would become:
- Dec 22 - Dec 31
Mar 22 would become:
- Jan 1 - Mar 22
The first range would always be for the current year; the second range would
always be for the
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
That uses
Hi,
I was still referring to the measurement table. You have an index on
stationid, but still seem to be getting a sequential scan. Maybe the planner
does not realise that you are selecting a small number of stations. Posting
an EXPLAIN ANALYSE would really help here.
Here is the result from
The greatest() expression reduces to either the current year (year + 0) or
the next year (year + 1) by taking the sign of the difference in start/end
days. This allows me to derive an end date, such as:
Dec 22, 1900 to Mar 22, 1901
Then I check if the measured date falls between those two dates.
Hi,
~300 million measurements
~12000 stations (not 7 as I mentioned before)
~5500 cities
some serious data tho, at least. Basically, PG is sequentially scanning
through all of the tables in your partitioning setup. What is
constraint_exclusion set to? What version of PG is this? Do the
Hi,
check (taken = '1913-12-01' and taken = '1913-12-31')
I don't think I want to constrain by year, for a few reasons:
1. There are a lot of years -- over 110.
2. There will be more years added (both in the future for 2010 and in the
past as I get data from other sources).
Currently I
Hi,
Something in here really smells fishy to me. Those extract's above are
working on values which are from the table.. Why aren't you using these
functions to figure out how to construct the actual dates based on the
values provided by the *user*..?
Because I've only been using PostgreSQL
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
Result (cost=0.00..21781.18 rows=8090 width=28)
- Append (cost=0.00..21781.18 rows=8090 width=28)
- Seq Scan on measurement
Hi,
(An EXPLAIN ANALYSE would be better here). Look at the expected number of
stations
Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual
time=0.053..41.173 rows=78 loops=1)
Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal -
Hi,
I recently switched to PostgreSQL from MySQL so that I can use PL/R for data
analysis. The query in MySQL form (against a more complex table structure)
takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish,
as it takes over a minute. I think I have the correct table
39 matches
Mail list logo