> Fair enough. How about something like make_timestamp? It's at least
> shorter and easier than construct :-)
>
Agreed.
Dave
>
> Does it make sense to allow minutes when hours isn't specified? Or
>
For time, 00 seems a reasonable default for all values; clearly document the
defaults. Also, having a default makes the code simpler than plus
. (Not to mention explaining it.) ;-)
SELECT to_timestamp( minutes := 19 ) -- er
Hi,
It's not immediately obvious what the default value of "timezone"
> will be?
>
The system's locale, like now(); documentation can clarify.
By named parameter, I meant default value. You could construct a timestamp
variable using:
construct_timestamp( year := 1900, hour := 1 )
When I read
Hi,
We had a little chat about this with Magnus. It's pretty surprising that
> there's no built-in function to do this, we should consider adding one.
>
I agree; you should be able to create a timestamp or a date from integer
values. Others, apparently, have written code. The implementation I did
Hi,
I added an explicit cast in the SQL:
dateserial(extract(YEAR FROM
m.taken)::int,'||p_month1||','||p_day1||') d1,
dateserial(extract(YEAR FROM
m.taken)::int,'||p_month2||','||p_day2||') d2
The function now takes three integer parameters; there was no performance
loss.
Thank y
Hi, Tom.
extract(YEAR FROM m.taken)
I thought that returned a double precision?
Dave
Hi,
Here is code to convert dates from integers without string concatenation:
Edit dateserial.c:
#include "postgres.h"
#include "utils/date.h"
#include "utils/nabstime.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
Datum dateserial(PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1 (dateserial);
Datu
Hi,
The C function returns a DateADT, which is a typedef for int32, but the
> CREATE FUNCTION statement claims that it returns 'text'.
>
That'll do it. Thank you!
but whether that is faster or slower I don't know. But I
> don't see why this query needs to be fast in the first
> place. It seems t
Hi, Tim.
Have you tested DATE_TRUNC()?
>
Not really; it returns a full timestamp and I would still have to
concatenate strings. My goal is to speed up the following code (where
*p_*parameters are user inputs):
*date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''')
d1,
Hi,
To avoid string concatenation using dates, I figured I could write a C
function:
#include "postgres.h"
#include "fmgr.h"
#include "utils/date.h"
#include "utils/nabstime.h"
#ifdef PG_MODULE_MAGIC
PG_MODULE_MAGIC;
#endif
Datum dateserial (PG_FUNCTION_ARGS);
PG_FUNCTION_INFO_V1 (dateserial);
Hi, Andy.
I assume you are doing this in a loop? Many Many Many times? cuz:
>
Yes. Here are the variations I have benchmarked (times are best of three):
Variation #0
-no date field-
Explain: http://explain.depesz.com/s/Y9R
Time: 2.2s
Variation #1
date('1960-1-1')
Explain: http://explain.depes
Hi,
I found a slow part of the query:
SELECT
* date(extract(YEAR FROM m.taken)||'-1-1') d1,*
* date(extract(YEAR FROM m.taken)||'-1-31') d2*
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND ...
Date extraction is 3.2 se
Hi,
Hmm, that's nice, though I cannot but wonder whether the exclusive lock
> required by CLUSTER is going to be a problem in the long run.
>
Not an issue; the inserts are one-time (or very rare; at most: once a year).
> Hm, keep in mind that if the station clause alone is not selective
> enou
Sorry, Alvaro.
I was contemplating using a GIN or GiST index as a way of optimizing the
query.
Instead, I found that re-inserting the data in order of station ID (the
primary look-up column) and then CLUSTER'ing on the station ID, taken date,
and category index increased the speed by an order of
Hi, Bryan.
Thanks for the notes. I thought about using a prepared statement, but I
cannot find any examples of using a PREPARE statement from within a
function, and don't really feel like tinkering around to figure it out.
Performance is at the point where the Java/PHP bridge and JasperReports ar
Salut, Cédric.
I wonder what the plan will be if you replace sc.taken_* in :
> m.taken BETWEEN sc.taken_start AND sc.taken_end
> by values. It might help the planner...
>
That is a fairly important restriction. I will try making it *
(year1||'-01-01')::date*, but I have no constant value for it -
Hi, Bryan.
I was just about to reply to the thread, thanks for asking. Clustering was
key. After rebooting the machine (just to make sure absolutely nothing was
cached), I immediately ran a report on Toronto: 5.25 seconds!
Here's what I did:
1. Created a new set of tables that matched the old
I was told to try OVERLAPS instead of checking years. The query is now:
SELECT
extract(YEAR FROM m.taken) AS year,
avg(m.amount) as amount
FROM
climate.city c,
climate.station s,
climate.station_category sc,
climate.measurement m
WHERE
c.id = 5148 AND
earth_di
Hi,
sc.taken_end <= '1996-12-31'::date AND
> m.taken BETWEEN sc.taken_start AND sc.taken_end AND
>
> category of data at a certain time. But I'm afraid this makes the planning
> much more difficult, as the select from measurements depend on the data
> returned by other parts of the query (rows fro
Hi, Kevin.
below something in the range of 1.5 to 2 is probably not going to be
> a good choice for the mix as a whole.
>
Good to know; thanks.
> This should probably be set to something on the order of 3GB. This
> will help the optimizer make more intelligent choices about when use
> of the i
Hi, Alexey.
Is it necessary to get the data as far as 1900 all the time ? Maybe there is
> a possibility to aggregate results from the past years if they are constant.
>
This I have done. I created another table (station_category) that associates
stations with when they started to take measuremen
Hi, Rob.
I tried bumping the effective_cache_size. It made no difference.
My latest attempt at forcing PostgreSQL to use the indexes involved two
loops: one to loop over the stations, the other to extract the station data
from the measurement table. The outer loop executes in 1.5 seconds. The
inn
Hi, Tom.
Yes, that is what happened, making the tests rather meaningless, and giving
me the false impression that the indexes were being used. They were but only
because of cached results. When multiple users making different queries, the
performance will return to ~80s per query.
I also tried Ke
Hi, Kevin.
Thanks for the response.
It sounds as though the active portion of your database is pretty
> much cached in RAM. True?
>
I would not have thought so; there are seven tables, each with 39 to 43
million rows as:
CREATE TABLE climate.measurement (
id bigserial NOT NULL,
taken date
Hi,
I changed the date comparison to be based on year alone:
extract(YEAR FROM sc.taken_start) >= 1900 AND
extract(YEAR FROM sc.taken_end) <= 2009 AND
The indexes are now always used; if someone wants to explain why using the
numbers works (a constant) but using a date (another constant?
Hi,
I wrote a query (see below) that extracts climate data from weather stations
within a given radius of a city using the dates for which those weather
stations actually have data. The query uses the measurement table's only
index:
CREATE UNIQUE INDEX measurement_001_stc_idx
ON climate.measure
Hi,
The problem is now solved (in theory).
Well, it's not the functions per se that's the problem, it's the lack of
> a useful index on the expression.
>
The measurement table indexes (on date and weather station) were not being
used because the only given date ranges (e.g., 1900 - 2009) were ca
Hi,
certainly understand that you wouldn't want to partition by year. It
>
Definitely not.
> does strike me that perhaps you could partition by day ranges, but you'd
>
I don't think that will work; users can choose any day range, with the most
common as Jan 1 - Dec 31, followed by seasonal ra
Hi,
CREATE INDEX measurement_01_001_y_idx
>> ON climate.measurement_01_001
>> USING btree
>> (date_part('year'::text, taken));
>>
>> Is that equivalent to what you suggest?
>>
>
> No. It is not the same function, so Postgres has no way to know it produces
> the same results (if it does).
>
Thi
Hi, Yeb.
This is starting to go back to the design I used with MySQL:
- YEAR_REF - Has year and station
- MONTH_REF - Has month, category, and yea referencer
- MEASUREMENT - Has month reference, amount, and day
Normalizing by date parts was fast. Partitioning the tables by year won't do
Hi,
(An EXPLAIN ANALYSE would be better here). Look at the expected number of
> stations
"Nested Loop (cost=0.00..994.94 rows=4046 width=4) (actual
time=0.053..41.173 rows=78 loops=1)"
" Join Filter: ((6371.009::double precision *
sqrt((pow(radians(((c.latitude_decimal - s.latitude_decimal))::
I took out the date conditions:
SELECT
m.*
FROM
climate.measurement m
WHERE
m.category_id = 1 and
m.station_id = 2043
This uses the station indexes:
"Result (cost=0.00..21781.18 rows=8090 width=28)"
" -> Append (cost=0.00..21781.18 rows=8090 width=28)"
"-> Seq Scan on measur
Hi,
Something in here really smells fishy to me. Those extract's above are
> working on values which are from the table.. Why aren't you using these
> functions to figure out how to construct the actual dates based on the
> values provided by the *user*..?
>
Because I've only been using Postgre
Hi,
check (taken >= '1913-12-01' and taken <= '1913-12-31')
>
I don't think I want to constrain by year, for a few reasons:
1. There are a lot of years -- over 110.
2. There will be more years added (both in the future for 2010 and in the
past as I get data from other sources).
Currently
Hi,
~300 million measurements
~12000 stations (not 7 as I mentioned before)
~5500 cities
some serious data tho, at least. Basically, PG is sequentially scanning
> through all of the tables in your partitioning setup. What is
> constraint_exclusion set to? What version of PG is this? Do th
The greatest() expression reduces to either the current year (year + 0) or
the next year (year + 1) by taking the sign of the difference in start/end
days. This allows me to derive an end date, such as:
Dec 22, 1900 to Mar 22, 1901
Then I check if the measured date falls between those two dates.
Hi,
I was still referring to the measurement table. You have an index on
> stationid, but still seem to be getting a sequential scan. Maybe the planner
> does not realise that you are selecting a small number of stations. Posting
> an EXPLAIN ANALYSE would really help here.
>
Here is the result f
I was hoping to eliminate this part of the query:
(cast(extract( YEAR FROM m.taken ) + greatest( -1 *
sign(
(extract( YEAR FROM m.taken )||'-12-31')::date -
(extract( YEAR FROM m.taken )||'-01-01')::date ), 0
) AS text)||'-12-31')::date
That uses
What if I were to have the application pass in two sets of date ranges?
For the condition of Dec 22 to Mar 22:
Dec 22 would become:
- Dec 22 - Dec 31
Mar 22 would become:
- Jan 1 - Mar 22
The first range would always be for the current year; the second range would
always be for the year
When using MySQL, the performance was okay (~5 seconds per query) using:
date( concat_ws( '-', y.year, m.month, d.day ) ) between
-- Start date.
date( concat_ws( '-', y.year, $P{Month1}, $P{Day1} ) ) AND
-- End date. Calculated by checking to see if the end date wraps
-- into the
Hi,
I have posted an image of the user inputs here:
http://i.imgur.com/MUkuZ.png
The problem is that I am given a range of days (Dec 22 - Mar 22) over a
range of years (1900 - 2009) and the range of days can span from one year to
the next. This is not the same as saying Dec 22, 1900 to Mar 22, 2
Hi,
I recently switched to PostgreSQL from MySQL so that I can use PL/R for data
analysis. The query in MySQL form (against a more complex table structure)
takes ~5 seconds to run. The query in PostgreSQL I have yet to let finish,
as it takes over a minute. I think I have the correct table structu
42 matches
Mail list logo