Re: [GENERAL] query by partial timestamp

2013-01-10 Thread Gavan Schneider

On Wednesday, January 9, 2013 at 04:42, Michael Nolan wrote:


On 1/8/13, Gavan Schneider wrote:


2.  SELECT ... WHERE
'2011-01-01'::TIMESTAMP <= col_of_type_timestamp
ANDcol_of_type_timestamp <=
'2011-12-31'::TIMESTAMP;


This won't  quite work, because '2011-12-31'::TIMESTAMP
is the same as 2011-12-31 00:00:00.0
so records timestamped later in the day on the 31st would not get selected

SELECT ... WHERE
'2011-01-01'::TIMESTAMP <= col_of_type_timestamp
AND col_of_type_timestamp < '2012-01:01'::TIMESTAMP;

would get all records with a 2011 timestamp.

Thank you. I was wondering where Tom and Depesz were coming from 
when they both said less than or equal to the 
'2011-12-31'::TIMESTAMP would miss data. I was giving it a rest 
before re-reading, testing, and/or asking 'the right question'. 
You have supplied the missing part to my puzzle.


Mostly I use DATE so have not had much practice wrestling the 
TIMESTAMP edge cases. I also prefer the closed-open equality 
tests as you suggest especially as they are the 'only way to go' 
when grouping data on a monthly basis. My only 'defense' is that 
I tried to craft my examples as close as possible to the OP 
statement and not introduce the 'next year' unless forced... 
lame I know. :)


Regards
Gavan Schneider



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-09 Thread hubert depesz lubaczewski
On Tue, Jan 08, 2013 at 04:19:59PM -0600, Kirk Wythers wrote:
> I have a column of type TIMESTAMP, I'd like to query all records from 2011. 
> If it were text I could use a partial such as:
> 
> WHERE
>   text ~ '2011'
> 
> There must be a simple way to pull the year part out of a timestamp format. 
> Thanks in advance.

using partial checks (like extract, date_part, or even casting field to
date) will have problem with index usage.
the best way to handle it, is to write the parameters using date
arithmetic.
like:
where column >= '2011-01-01' and column < '2012-01-01'

do not be tempted to do:
where column >= '2011-01-01' and column <='2011-12-31'
which is very bad idea, and will cause data loss.

More on index usage:
http://www.depesz.com/2010/09/09/why-is-my-index-not-being-used/

Best regards,

depesz

-- 
The best thing about modern society is how easy it is to avoid contact with it.
 http://depesz.com/


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-09 Thread Michael Nolan
It is probably not the most efficient, but I often use this syntax,
which reads better.

Select . where col_type_timestamp::date between '2011-01-01' and
'2011-12-31'

This will use a timestamp index.
--
Mike Nolan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-09 Thread Michael Nolan
On 1/8/13, Gavan Schneider  wrote:

> 2.  SELECT ... WHERE
>  '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>  ANDcol_of_type_timestamp <=
> '2011-12-31'::TIMESTAMP;

This won't  quite work, because '2011-12-31'::TIMESTAMP
is the same as 2011-12-31 00:00:00.0
so records timestamped later in the day on the 31st would not get selected

  SELECT ... WHERE
 '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
 AND col_of_type_timestamp < '2012-01:01'::TIMESTAMP;

would get all records with a 2011 timestamp.
--
Mike Nolan


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-09 Thread Steve Crawford

On 01/08/2013 06:15 PM, Kirk Wythers wrote:


On Jan 8, 2013, at 6:48 PM, Tom Lane > wrote:



The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.


There are only a few years worth of data, 2008 - 2012. However, the 
data consists of 15 min measurements and when renormalized 
(un-pivoted) is several hundred million records. It is conceivable 
that someone will want to query by month, or even hour of the day.


As another poster mentioned, you may want to consider partitioning the 
table not only for performance but also for eventual archiving/purging 
of the data.


As long as we are looking at a variety of alternatives, appropriate 
construction of partial indexes and the query *might* be of value but at 
Tom and I mentioned previously, indexes become more of a hindrance than 
a help once you start writing queries that access too much of the table 
so the planner won't use them in those cases.


Cheers,
Steve


Re: [GENERAL] query by partial timestamp

2013-01-09 Thread Nathan Clayton
On Jan 8, 2013 6:15 PM, "Kirk Wythers"  wrote:
>
>
> On Jan 8, 2013, at 6:48 PM, Tom Lane  wrote:
>
>> The OP didn't
>> suggest how many years his data covers, but it's quite possible that
>> pulling a full year's worth of data will read enough of the table that
>> there's no point in worrying about whether an index could be used
>> anyway.
>
>
> There are only a few years worth of data, 2008 - 2012. However, the data
consists of 15 min measurements and when renormalized (un-pivoted) is
several hundred million records. It is conceivable that someone will want
to query by month, or even hour of the day.

If that's the case, you may want to look at creating a date dimension and
possibly a time dimension for your data analysis (there's a good one to
start with on the PostgreSQL wiki). I would highly recommend that you take
a look at some dimensional modeling concepts (Kimball is a good place to
start).

Also, you may want to look at partitioning the data if it's several hundred
million rows.


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Kirk Wythers

On Jan 8, 2013, at 6:48 PM, Tom Lane  wrote:

> The OP didn't
> suggest how many years his data covers, but it's quite possible that
> pulling a full year's worth of data will read enough of the table that
> there's no point in worrying about whether an index could be used
> anyway.

There are only a few years worth of data, 2008 - 2012. However, the data 
consists of 15 min measurements and when renormalized (un-pivoted) is several 
hundred million records. It is conceivable that someone will want to query by 
month, or even hour of the day. 

Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Tom Lane
Gavan Schneider  writes:
> From my perspective there are at least three ways to attack 
> this problem:

> (I have not tested these, so apologies for the stupid syntax errors.)

> 1.  SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp);

> 2.  SELECT ... WHERE
>  '2011-01-01'::TIMESTAMP <= col_of_type_timestamp
>  ANDcol_of_type_timestamp <= 
> '2011-12-31'::TIMESTAMP;

> 3.  SELECT ... WHERE
>  (col_of_type_timestamp, col_of_type_timestamp) OVERLAPS
>  (DATE '2011-01-01', DATE '2012-01-01');

> Is this the full list?

Another possibility is date_trunc, viz

4. SELECT ... WHERE date_trunc(col_of_type_timestamp, 'year') = '2011-01-01'

You could also use BETWEEN, but that's just syntactic sugar for method 2.

Note that in both methods 2 and 3 it's easy to get the edge cases wrong;
in particular I think your version of method 2 gives the wrong answer
for later-than-midnight times on 2011-12-31, while #3 might (not sure)
give the wrong answer for exactly midnight on 2012-01-01.  These things
are fixable of course with a bit of care.  Personally I'd go with

col >= '2011-01-01' AND col < '2012-01-01'

> So... generalizing the original question: which approach would 
> yield the best performance and/or compliance with SQL standards?

> I note Steve Crawford has (strongly) hinted that direct date 
> comparison is more likely to use an index (when available) so I 
> suspect this is the way to go, but would an index based on 
> extract(YEAR...) negate this difference?

Method 3 is not indexable at all and is unlikely to become so --- the
SQL standard's definition of OVERLAPS is squirrely enough that people
haven't bothered to think about optimizing it.  Method 2 works well with
a plain btree index on the timestamp column.  You can get method 1 to be
indexed if you create a functional index on "extract(year from col)";
but since the index would have pretty much no other use than answering
this exact type of query, that's not a very attractive alternative.
Method 4 is like method 1 --- you'd need a specialized index.

Note that in any case an index is not going to be helpful if the query
would need to fetch more than a few percent of the table.  The OP didn't
suggest how many years his data covers, but it's quite possible that
pulling a full year's worth of data will read enough of the table that
there's no point in worrying about whether an index could be used
anyway.

Another thing to think about is whether you'll have related sorts of
queries that aren't about full years --- maybe sometimes you need a
month's worth of data, for example.  The BETWEEN-style query and a btree
index will adapt easily to non-year intervals, while the EXTRACT
approach will not, and date_trunc is rather limited as well.

regards, tom lane


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Gavan Schneider

On Tuesday, January 8, 2013 at 09:26, Raymond O'Donnell wrote:


On 08/01/2013 22:19, Kirk Wythers wrote:

I have a column of type TIMESTAMP, I'd like to query all records from
2011. If it were text I could use a partial such as:

WHERE text ~ '2011'

There must be a simple way to pull the year part out of a timestamp
format. Thanks in advance.


You want the extract() function.

From my perspective there are at least three ways to attack 
this problem:


(I have not tested these, so apologies for the stupid syntax errors.)

1.  SELECT ... WHERE 2011 = extract(YEAR FROM col_of_type_timestamp);

2.  SELECT ... WHERE
'2011-01-01'::TIMESTAMP <= col_of_type_timestamp
ANDcol_of_type_timestamp <= 
'2011-12-31'::TIMESTAMP;

3.  SELECT ... WHERE
(col_of_type_timestamp, col_of_type_timestamp) OVERLAPS
(DATE '2011-01-01', DATE '2012-01-01');

Is this the full list?

So... generalizing the original question: which approach would 
yield the best performance and/or compliance with SQL standards?


I note Steve Crawford has (strongly) hinted that direct date 
comparison is more likely to use an index (when available) so I 
suspect this is the way to go, but would an index based on 
extract(YEAR...) negate this difference?


Regards
Gavan Schneider



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Steve Crawford

On 01/08/2013 02:19 PM, Kirk Wythers wrote:

I have a column of type TIMESTAMP, I'd like to query all records from 2011. If 
it were text I could use a partial such as:

WHERE
text ~ '2011'

There must be a simple way to pull the year part out of a timestamp format. 
Thanks in advance.





As others pointed out, you can do this with "extract(...).

BUT, if your timestamp column is indexed (and if the index will 
constrain your records to a sufficiently small subset of the table that 
use of indexes is warranted) you may be better off using date 
comparisons. I doubt the planner will use the indexes otherwise.


Cheers,
Steve



--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Rob Sargent

On 01/08/2013 03:39 PM, Raymond O'Donnell wrote:

On 08/01/2013 22:26, Raymond O'Donnell wrote:

On 08/01/2013 22:19, Kirk Wythers wrote:

I have a column of type TIMESTAMP, I'd like to query all records from
2011. If it were text I could use a partial such as:

WHERE text ~ '2011'

There must be a simple way to pull the year part out of a timestamp
format. Thanks in advance.


You want the extract() function.


Sorry, meant to include the reference:

http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Ray.


Ray, I thought you were simply encouraging the OP to learn to fish :)


--
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Raymond O'Donnell
On 08/01/2013 22:26, Raymond O'Donnell wrote:
> On 08/01/2013 22:19, Kirk Wythers wrote:
>> I have a column of type TIMESTAMP, I'd like to query all records from
>> 2011. If it were text I could use a partial such as:
>>
>> WHERE text ~ '2011'
>>
>> There must be a simple way to pull the year part out of a timestamp
>> format. Thanks in advance.
> 
> You want the extract() function.

Sorry, meant to include the reference:

http://www.postgresql.org/docs/9.2/static/functions-datetime.html#FUNCTIONS-DATETIME-EXTRACT

Ray.

-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


Re: [GENERAL] query by partial timestamp

2013-01-08 Thread Raymond O'Donnell
On 08/01/2013 22:19, Kirk Wythers wrote:
> I have a column of type TIMESTAMP, I'd like to query all records from
> 2011. If it were text I could use a partial such as:
> 
> WHERE text ~ '2011'
> 
> There must be a simple way to pull the year part out of a timestamp
> format. Thanks in advance.

You want the extract() function.

Ray.


-- 
Raymond O'Donnell :: Galway :: Ireland
r...@iol.ie


-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general


[GENERAL] query by partial timestamp

2013-01-08 Thread Kirk Wythers
I have a column of type TIMESTAMP, I'd like to query all records from 2011. If 
it were text I could use a partial such as:

WHERE
text ~ '2011'

There must be a simple way to pull the year part out of a timestamp format. 
Thanks in advance.




-- 
Sent via pgsql-general mailing list (pgsql-general@postgresql.org)
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-general