Re: [PERFORM] Analysis Function
Tom Lane wrote: David Jarvis thanga...@gmail.com writes: Fair enough. How about something like make_timestamp? It's at least shorter and easier than construct :-) Agreed. No objection here either. Added to TODO: Add function to allow the creation of timestamps using parameters * http://archives.postgresql.org/pgsql-performance/2010-06/msg00232.php -- Bruce Momjian br...@momjian.ushttp://momjian.us EnterpriseDB http://enterprisedb.com + None of us is going to be here forever. + -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
On Mon, Jun 14, 2010 at 15:59, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 21:19, David Jarvis thanga...@gmail.com wrote: I prefer to_timestamp and to_date over the more verbose construct_timestamp. Yeah, I agree with that. Those names are already taken. It will cause confusion (of both people and machines) if you try to overload them with this. Fair enough. How about something like make_timestamp? It's at least shorter and easier than construct :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Fair enough. How about something like make_timestamp? It's at least shorter and easier than construct :-) Agreed. Dave
Re: [PERFORM] Analysis Function
David Jarvis thanga...@gmail.com writes: Fair enough. How about something like make_timestamp? It's at least shorter and easier than construct :-) Agreed. No objection here either. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
On Sun, Jun 13, 2010 at 17:58, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 17:42, Tom Lane t...@sss.pgh.pa.us wrote: ... (We presumably want timezone to default to the system timezone setting, but I wonder how we should make that work --- should an empty string be treated as meaning that?) Umm. NULL could be made to mean that, or we could provicde two different versions - one that takes TZ and one that doesn't. Using NULL like that seems a bit awkward: for one thing it'd mean the function couldn't be STRICT, and also it'd be bizarre that only this one argument could be null without leading to a null result. Hmm, yeah. And two separate functions isn't good either. Basically, I think it's important that there be a way to specify an explicit parameter value that behaves identically to the default. In that case, empty string seems fairly reasonable - if you look at the text based parsing, that's what we do if the timezone is an empty string (meaning not specified). And a third, construct_time(), no? Yeah, maybe ... do you think there's any demand for it? Yes, I think there is. Plus, it's for completeness :-) -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
On Sun, Jun 13, 2010 at 21:19, David Jarvis thanga...@gmail.com wrote: Hi, It's not immediately obvious what the default value of timezone will be? The system's locale, like now(); documentation can clarify. By named parameter, I meant default value. You could construct a timestamp variable using: construct_timestamp( year := 1900, hour := 1 ) When I read that code, the first thing I think it should return is: 1900-01-01 01:00:00.-07 I agree construct_timestamp( hour := 1 ) and construct_date() are errors: year is required. Does it make sense to allow minutes when hours isn't specified? Or should we simply say that for each of the date and the time part, to specify at level n you need to have everything from the top up to level n-1 specified? E.g. month requires year to be specified, day requires both year and month etc? I prefer to_timestamp and to_date over the more verbose construct_timestamp. Yeah, I agree with that. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Does it make sense to allow minutes when hours isn't specified? Or For time, 00 seems a reasonable default for all values; clearly document the defaults. Also, having a default makes the code simpler than level n plus level n-1. (Not to mention explaining it.) ;-) SELECT to_timestamp( minutes := 19 ) -- error (year not specified) SELECT to_timestamp( year := 2000, minutes := 19 ) -- 2000-01-01 00:19:00.-07 Dave
Re: [PERFORM] Analysis Function
Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 21:19, David Jarvis thanga...@gmail.com wrote: I prefer to_timestamp and to_date over the more verbose construct_timestamp. Yeah, I agree with that. Those names are already taken. It will cause confusion (of both people and machines) if you try to overload them with this. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
On 11/06/10 23:38, David Jarvis wrote: I added an explicit cast in the SQL: dateserial(extract(YEAR FROM m.taken)::int,'||p_month1||','||p_day1||') d1, dateserial(extract(YEAR FROM m.taken)::int,'||p_month2||','||p_day2||') d2 The function now takes three integer parameters; there was no performance loss. We had a little chat about this with Magnus. It's pretty surprising that there's no built-in function to do this, we should consider adding one. We could have a function like: construct_timestamp(year int4, month int4, date int4, hour int4, minute int4, second int4, milliseconds int4, timezone text) Now that we have named parameter notation, callers can use it to conveniently fill in only the fields needed: SELECT construct_timestamp(year := 1999, month := 10, date := 22); -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Hi, We had a little chat about this with Magnus. It's pretty surprising that there's no built-in function to do this, we should consider adding one. I agree; you should be able to create a timestamp or a date from integer values. Others, apparently, have written code. The implementation I did was pretty rudimentary, but I was going for speed. If you could overload to_date and to_timestamp, that would be great. For example: to_date( year ) = year-01-01 to_date( year, month ) = year-month-01 to_date( year, month, day ) = year-month-day to_timestamp( year, month, day, hour ) = year-month-day hour:00:00. GMT etc. construct_timestamp(year int4, month int4, date int4, hour int4, minute int4, second int4, milliseconds int4, timezone text) Also, date int4 should be day int4, to avoid confusion with the date type. Does it makes sense to use named parameter notation for the first value (the year)? This could be potentially confusing: to_date() - What would this return? now()? Jan 1st, 1970? 2000? Similarly, to_timestamp() ...? Seems meaningless without at least a full date and an hour. Dave
Re: [PERFORM] Analysis Function
On Sun, Jun 13, 2010 at 09:38, David Jarvis thanga...@gmail.com wrote: Hi, We had a little chat about this with Magnus. It's pretty surprising that there's no built-in function to do this, we should consider adding one. I agree; you should be able to create a timestamp or a date from integer values. Others, apparently, have written code. The implementation I did was pretty rudimentary, but I was going for speed. If you could overload to_date and to_timestamp, that would be great. For example: to_date( year ) = year-01-01 to_date( year, month ) = year-month-01 to_date( year, month, day ) = year-month-day to_timestamp( year, month, day, hour ) = year-month-day hour:00:00. GMT etc. Not that it would make a huge difference over having to specify 1's and 0's there, but I agree that could be useful. construct_timestamp(year int4, month int4, date int4, hour int4, minute int4, second int4, milliseconds int4, timezone text) Also, date int4 should be day int4, to avoid confusion with the date type. Yes, absolutely. Does it makes sense to use named parameter notation for the first value (the year)? This could be potentially confusing: How so? If it does named parameters, why not all? to_date() - What would this return? now()? Jan 1st, 1970? 2000? ERROR, IMHO. We have a function for now() already, and the others are so arbitrary there is no way to explain such a choice. Similarly, to_timestamp() ...? Seems meaningless without at least a full date and an hour. Agreed. -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Heikki Linnakangas heikki.linnakan...@enterprisedb.com writes: We could have a function like: construct_timestamp(year int4, month int4, date int4, hour int4, minute int4, second int4, milliseconds int4, timezone text) This fails to allow specification to the microsecond level (and note that with float timestamps even smaller fractions have potential use). I would suggest dropping the milliseconds argument and instead letting the seconds arg be float8. That seems a closer match to the way people think about the textual representation. Now that we have named parameter notation, callers can use it to conveniently fill in only the fields needed: It's not immediately obvious what the default value of timezone will be? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 09:38, David Jarvis thanga...@gmail.com wrote: Does it makes sense to use named parameter notation for the first value (the year)? This could be potentially confusing: How so? If it does named parameters, why not all? There's no reason not to allow the year parameter to be named. What I think it shouldn't have is a default. OTOH I see no good reason not to allow the other ones to have defaults. (We presumably want timezone to default to the system timezone setting, but I wonder how we should make that work --- should an empty string be treated as meaning that?) Similarly, to_timestamp() ...? Seems meaningless without at least a full date and an hour. Agreed. No, I think it's perfectly sane to allow month/day to default to 1 and h/m/s to zeroes. I do think it might be a good idea to have two functions, construct_timestamp yielding timestamptz and construct_date yielding date (and needing only 3 args). When you only want a date, having to use construct_timestamp and cast will be awkward and much more expensive than is needed (timezone rotations aren't real cheap). regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
On Sun, Jun 13, 2010 at 17:42, Tom Lane t...@sss.pgh.pa.us wrote: Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 09:38, David Jarvis thanga...@gmail.com wrote: Does it makes sense to use named parameter notation for the first value (the year)? This could be potentially confusing: How so? If it does named parameters, why not all? There's no reason not to allow the year parameter to be named. What I think it shouldn't have is a default. OTOH I see no good reason not to allow the other ones to have defaults. (We presumably want timezone to default to the system timezone setting, but I wonder how we should make that work --- should an empty string be treated as meaning that?) Umm. NULL could be made to mean that, or we could provicde two different versions - one that takes TZ and one that doesn't. Similarly, to_timestamp() ...? Seems meaningless without at least a full date and an hour. Agreed. No, I think it's perfectly sane to allow month/day to default to 1 and h/m/s to zeroes. I do think it might be a good idea to have two functions, construct_timestamp yielding timestamptz and construct_date yielding date (and needing only 3 args). When you only want a date, having to use construct_timestamp and cast will be awkward and much more expensive than is needed (timezone rotations aren't real cheap). And a third, construct_time(), no? -- Magnus Hagander Me: http://www.hagander.net/ Work: http://www.redpill-linpro.com/ -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Magnus Hagander mag...@hagander.net writes: On Sun, Jun 13, 2010 at 17:42, Tom Lane t...@sss.pgh.pa.us wrote: ... (We presumably want timezone to default to the system timezone setting, but I wonder how we should make that work --- should an empty string be treated as meaning that?) Umm. NULL could be made to mean that, or we could provicde two different versions - one that takes TZ and one that doesn't. Using NULL like that seems a bit awkward: for one thing it'd mean the function couldn't be STRICT, and also it'd be bizarre that only this one argument could be null without leading to a null result. And two separate functions isn't good either. Basically, I think it's important that there be a way to specify an explicit parameter value that behaves identically to the default. And a third, construct_time(), no? Yeah, maybe ... do you think there's any demand for it? regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Hi, It's not immediately obvious what the default value of timezone will be? The system's locale, like now(); documentation can clarify. By named parameter, I meant default value. You could construct a timestamp variable using: construct_timestamp( year := 1900, hour := 1 ) When I read that code, the first thing I think it should return is: 1900-01-01 01:00:00.-07 I agree construct_timestamp( hour := 1 ) and construct_date() are errors: year is required. Dave P.S. I prefer to_timestamp and to_date over the more verbose construct_timestamp.
Re: [PERFORM] Analysis Function
Hi, To avoid string concatenation using dates, I figured I could write a C function: #include postgres.h #include fmgr.h #include utils/date.h #include utils/nabstime.h #ifdef PG_MODULE_MAGIC PG_MODULE_MAGIC; #endif Datum dateserial (PG_FUNCTION_ARGS); PG_FUNCTION_INFO_V1 (dateserial); Datum dateserial (PG_FUNCTION_ARGS) { int32 p_year = PG_GETARG_INT32(0); int32 p_month = PG_GETARG_INT32(1); int32 p_day = PG_GETARG_INT32(2); DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE; PG_RETURN_DATEADT(d); } Compiles without errors or warnings. The function is integrated as follows: CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer) RETURNS text AS 'ymd.so', 'dateserial' LANGUAGE 'c' IMMUTABLE STRICT COST 1; However, when I try to use it, the database segfaults: select dateserial( 2007, 1, 3 ) Any ideas why? Thank you! Dave P.S. I have successfully written a function that creates a mmDD formatted string (using *sprintf*) when given three integers. It returns as expected; I ran it as follows: dateserial( extract(YEAR FROM m.taken)::int, 1, 1 )::date This had a best-of-three time of 3.7s compared with 4.3s using string concatenation. If I can eliminate all the typecasts, and pass in m.taken directly (rather than calling *extract*), I think the speed will be closer to 2.5s. Any hints would be greatly appreciated.
Re: [PERFORM] Analysis Function
On 11/06/10 11:25, David Jarvis wrote: Datum dateserial (PG_FUNCTION_ARGS) { int32 p_year = PG_GETARG_INT32(0); int32 p_month = PG_GETARG_INT32(1); int32 p_day = PG_GETARG_INT32(2); DateADT d = date2j (p_year, p_month, p_day) - POSTGRES_EPOCH_JDATE; PG_RETURN_DATEADT(d); } Compiles without errors or warnings. The function is integrated as follows: CREATE OR REPLACE FUNCTION dateserial(integer, integer, integer) RETURNS text AS 'ymd.so', 'dateserial' LANGUAGE 'c' IMMUTABLE STRICT COST 1; However, when I try to use it, the database segfaults: select dateserial( 2007, 1, 3 ) Any ideas why? The C function returns a DateADT, which is a typedef for int32, but the CREATE FUNCTION statement claims that it returns 'text'. -- Heikki Linnakangas EnterpriseDB http://www.enterprisedb.com -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
David Jarvis thanga...@gmail.com wrote: Have you tested DATE_TRUNC()? Not really; it returns a full timestamp and I would still have to concatenate strings. My goal is to speed up the following code (where *p_*parameters are user inputs): *date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1, date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2* Using DATE_TRUNC() won't help here, as far as I can tell. Removing the concatenation will halve the query's time. Such as: dateserial( m.taken, p_month1, p_day1 ) d1, dateserial( m.taken, p_month2, p_day2 ) d2 My testing so far has shown a modest improvement by using a C function (to avoid concatenation). You could use: | (DATE_TRUNC('year', m.taken) + p_month1 * '1 month'::INTERVAL + p_day1 * '1 day'::INTERVAL)::DATE but whether that is faster or slower I don't know. But I don't see why this query needs to be fast in the first place. It seems to be interactive, and therefore I wouldn't invest too much time to have the user wait not 4.4, but 2.2 seconds. You could also do the concatenation in the ap- plication if that is faster than PostgreSQL's date arithme- tics. Tim -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
David Jarvis thanga...@gmail.com writes: dateserial(PG_FUNCTION_ARGS) { int32 p_year = (int32)PG_GETARG_FLOAT8(0); int32 p_month = PG_GETARG_INT32(1); int32 p_day = PG_GETARG_INT32(2); Er ... why float? Integer is plenty for the range of years supported by the PG datetime infrastructure. The above coding is pretty lousy in terms of its roundoff and overflow behavior, too. regards, tom lane -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Hi, Tom. extract(YEAR FROM m.taken) I thought that returned a double precision? Dave
Re: [PERFORM] Analysis Function
Hi, I added an explicit cast in the SQL: dateserial(extract(YEAR FROM m.taken)::int,'||p_month1||','||p_day1||') d1, dateserial(extract(YEAR FROM m.taken)::int,'||p_month2||','||p_day2||') d2 The function now takes three integer parameters; there was no performance loss. Thank you. Dave
Re: [PERFORM] Analysis Function
Hi, I found a slow part of the query: SELECT * date(extract(YEAR FROM m.taken)||'-1-1') d1,* * date(extract(YEAR FROM m.taken)||'-1-31') d2* FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id = 5148 AND ... Date extraction is 3.2 seconds, but without is 1.5 seconds. The PL/pgSQL code that actually runs (where p_month1, p_day1, and p_month2, p_day2 are integers): *date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1, date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2 * What is a better way to create those dates (without string concatenation, I presume)? Dave
Re: [PERFORM] Analysis Function
On 06/10/2010 07:41 PM, David Jarvis wrote: Hi, I found a slow part of the query: SELECT * date(extract(YEAR FROM m.taken)||'-1-1') d1,* * date(extract(YEAR FROM m.taken)||'-1-31') d2* FROM climate.city c, climate.station s, climate.station_category sc, climate.measurement m WHERE c.id http://c.id = 5148 AND ... Date extraction is 3.2 seconds, but without is 1.5 seconds. The PL/pgSQL code that actually runs (where p_month1, p_day1, and p_month2, p_day2 are integers): *date(extract(YEAR FROM m.taken)||''-'||p_month1||'-'||p_day1||''') d1, date(extract(YEAR FROM m.taken)||''-'||p_month2||'-'||p_day2||''') d2 * What is a better way to create those dates (without string concatenation, I presume)? Dave I assume you are doing this in a loop? Many Many Many times? cuz: andy=# select date(extract(year from current_date) || '-1-1'); date 2010-01-01 (1 row) Time: 0.528 ms Its pretty quick. You say without its 1.5 seconds? Thats all you change? Can we see the sql and 'explain analyze' for both? -Andy -- Sent via pgsql-performance mailing list (pgsql-performance@postgresql.org) To make changes to your subscription: http://www.postgresql.org/mailpref/pgsql-performance
Re: [PERFORM] Analysis Function
Hi, Andy. I assume you are doing this in a loop? Many Many Many times? cuz: Yes. Here are the variations I have benchmarked (times are best of three): Variation #0 -no date field- Explain: http://explain.depesz.com/s/Y9R Time: 2.2s Variation #1 date('1960-1-1') Explain: http://explain.depesz.com/s/DW2 Time: 2.6s Variation #2 date('1960'||'-1-1') Explain: http://explain.depesz.com/s/YuX Time: 3.1s Variation #3 date(extract(YEAR FROM m.taken)||'-1-1') Explain: http://explain.depesz.com/s/1I Time: 4.3s Variation #4 to_date( date_part('YEAR', m.taken)::text, '' ) + interval '0 months' + interval '0 days' Explain: http://explain.depesz.com/s/fIT Time: 4.4s What I would like is along Variation #5: *PGTYPESdate_mdyjul(taken_year, p_month1, p_day1)* Time: 2.3s I find it interesting that variation #2 is half a second slower than variation #1. The other question I have is: why does PG seem to discard the results? In pgAdmin3, I can keep pressing F5 and (before 8.4.4?) the results came back in 4s for the first response then 1s in subsequent responses. Dave