Re: [SQL] Aggregates (last/first) not behaving

2010-07-22 Thread Richard Huxton

On 22/07/10 07:37, Wes Devauld wrote:

I have PostgreSQL 8.3.9 [PostgreSQL 8.3.9 on i386-apple-darwin10.3.0,
compiled by GCC i686-apple-darwin10-gcc-4.2.1 (GCC) 4.2.1 (Apple Inc.
build 5646) (dot 1)]

and the custom first and last aggregates from:
http://wiki.postgresql.org/wiki/First_(aggregate)
http://wiki.postgresql.org/wiki/Last_(aggregate)

I have a simple table, of two columns.  The first is a timestamp and is
the primary key, the second is an integer.  I've loaded the table up
with values, one for every minute, for a whole year.  Some SQL to
recreate the table and the aggregates can be retrieved from:

http://blog.devauld.ca/aggregate_test.zip (File is approximately 180KB)

Now when I try to make use of the first and last aggregates, I get:

# select first(t), last(t) from test group by extract(day from t);
 first|last
-+-
  2009-01-01 00:00:00 | 2009-01-01 17:02:00
  2009-01-02 10:07:00 | 2009-01-02 10:06:00
  2009-01-03 20:15:00 | 2009-01-03 20:14:00

[snip]

For some reason the aggregates are not falling into the proper group.  I
can't blame timezones as the results are all over the map, and
first/last relationship is broken as in some cases 'last' is
chronologically before 'first'


They all seem grouped properly (by day) to me. Unless I've missed something.

The first/last aggregates aren't ordered in any way. They are "first 
value I happened to find" and "last value I happened to find".


If you want the earliest/latest timestamp from each day, use min() and 
max().


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


[SQL] grouping subsets

2010-07-22 Thread Rainer Stengele
Hi,

having a table similar to

| 1 | B | [2010-07-15 Do] |
| 1 | B | [2010-07-16 Fr] |
|---+---+-|
| 2 | C | [2010-07-17 Sa] |
| 2 | C | [2010-07-18 So] |
|---+---+-|
| 1 | B | [2010-07-19 Mo] |
| 1 | B | [2010-07-20 Di] |
| 1 | B | [2010-07-21 Mi] |
| 1 | B | [2010-07-22 Do] |
|---+---+-|
| 3 | D | [2010-07-23 Fr] |

a simple group by gives me:

| 6 | B |
| 4 | C |
| 3 | D |


What I want to get is the values grouped by "subset", where a subset is a set 
of rows with identical column until the colum changes.
Is there a way to get

| 2 | B |
| 4 | C |
| 4 | B |
| 3 | D |

by SQL only?

- Rainer





-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread A. Kretschmer
In response to Rainer Stengele :
> Hi,
> 
> having a table similar to
> 
> | 1 | B | [2010-07-15 Do] |
> | 1 | B | [2010-07-16 Fr] |
> |---+---+-|
> | 2 | C | [2010-07-17 Sa] |
> | 2 | C | [2010-07-18 So] |
> |---+---+-|
> | 1 | B | [2010-07-19 Mo] |
> | 1 | B | [2010-07-20 Di] |
> | 1 | B | [2010-07-21 Mi] |
> | 1 | B | [2010-07-22 Do] |
> |---+---+-|
> | 3 | D | [2010-07-23 Fr] |
> 
> a simple group by gives me:
> 
> | 6 | B |
> | 4 | C |
> | 3 | D |
> 
> 
> What I want to get is the values grouped by "subset", where a subset is a set 
> of rows with identical column until the colum changes.
> Is there a way to get
> 
> | 2 | B |
> | 4 | C |
> | 4 | B |
> | 3 | D |
> 
> by SQL only?

I think, the problem is that there are 2 identical groups. I think, you
can write a pl/pgsql-proc, selecting all ordered by the date-field and
walking through the result to do the grouping, checking if the 2nd
column is different from the previous. 

With plain SQL it's maybe possible too, but i don't know how ...

Regards, Andreas
-- 
Andreas Kretschmer
Kontakt:  Heynitz: 035242/47150,   D1: 0160/7141639 (mehr: -> Header)
GnuPG: 0x31720C99, 1006 CCB4 A326 1D42 6431  2EB0 389D 1DC2 3172 0C99

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread Richard Huxton

On 22/07/10 11:02, A. Kretschmer wrote:

In response to Rainer Stengele :

What I want to get is the values grouped by "subset", where a subset is a set 
of rows with identical column until the colum changes.
Is there a way to get

| 2 | B |
| 4 | C |
| 4 | B |
| 3 | D |

by SQL only?


I think, the problem is that there are 2 identical groups. I think, you
can write a pl/pgsql-proc, selecting all ordered by the date-field and
walking through the result to do the grouping, checking if the 2nd
column is different from the previous.

With plain SQL it's maybe possible too, but i don't know how ...


It should be do-able in 8.4 onwards, look into windowing functions. In 
particular the lag() function:


SELECT
mycode,
mydate,
lag(mycode) OVER (ORDER BY mydate) AS prev_code
FROM
mytable
ORDER BY mydate;

It should be possible to use that as a subquery with an outer query that 
compares mycode=prev_code to get a run length.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread Tim Landscheidt
Richard Huxton  wrote:

>>> What I want to get is the values grouped by "subset", where a subset is a 
>>> set of rows with identical column until the colum changes.
>>> Is there a way to get

>>> | 2 | B |
>>> | 4 | C |
>>> | 4 | B |
>>> | 3 | D |

>>> by SQL only?

>> I think, the problem is that there are 2 identical groups. I think, you
>> can write a pl/pgsql-proc, selecting all ordered by the date-field and
>> walking through the result to do the grouping, checking if the 2nd
>> column is different from the previous.

>> With plain SQL it's maybe possible too, but i don't know how ...

> It should be do-able in 8.4 onwards, look into windowing
> functions. In particular the lag() function:

> SELECT
> mycode,
> mydate,
> lag(mycode) OVER (ORDER BY mydate) AS prev_code
> FROM
> mytable
> ORDER BY mydate;

> It should be possible to use that as a subquery with an
> outer query that compares mycode=prev_code to get a run
> length.

Hmmm. Can the outer query be done without using "WITH
RECURSIVE"?

Tim


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread Oliveiros d'Azevedo Cristina

Howdy, Rainer.

Please advice me,

The dates always follow that sequential pattern?

Or can be holes on the dates sequence?

Best,
Oliveiros

- Original Message - 
From: "Rainer Stengele" 

To: 
Sent: Thursday, July 22, 2010 9:09 AM
Subject: [SQL] grouping subsets



Hi,

having a table similar to

| 1 | B | [2010-07-15 Do] |
| 1 | B | [2010-07-16 Fr] |
|---+---+-|
| 2 | C | [2010-07-17 Sa] |
| 2 | C | [2010-07-18 So] |
|---+---+-|
| 1 | B | [2010-07-19 Mo] |
| 1 | B | [2010-07-20 Di] |
| 1 | B | [2010-07-21 Mi] |
| 1 | B | [2010-07-22 Do] |
|---+---+-|
| 3 | D | [2010-07-23 Fr] |

a simple group by gives me:

| 6 | B |
| 4 | C |
| 3 | D |


What I want to get is the values grouped by "subset", where a subset is a 
set of rows with identical column until the colum changes.

Is there a way to get

| 2 | B |
| 4 | C |
| 4 | B |
| 3 | D |

by SQL only?

- Rainer





--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql 



--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread Rainer Stengele
 Hi Oliveiros,

yes, the date is always incremented - but anyway the date column is not really 
the point!
Actually the first tow columns are relevant.
I want them gouped together as indicated, adding up column 1 in the blocks with 
identical second column, but not adding up over all the rows.

Hope I could express it!

Mit freundlichen Grüßen / Best Regards
Rainer Stengele 

__|___ 
  | Dipl. Inf. (Univ.) Rainer Stengele   
  | Technical Control - System Administration  
  |
  | email: [email protected] 
  | voice/fax: ++49-9131-7778-85/88
  | WWW  : http://www.diplan.de 
  |
  | diplan GmbH
  | Wetterkreuz 27
  | 91058 Erlangen, Germany  
 
Diese E-Mail kann vertrauliche und/oder rechtlich geschützte Informationen 
enthalten. Wenn Sie nicht der richtige Adressat sind oder diese E-Mail 
irrtümlich erhalten haben, informieren Sie bitte den Absender und vernichten 
Sie diese Mail. Das unerlaubte Kopieren sowie die unbefugte Weitergabe dieser 
Mail ist nicht gestattet.

This e-mail may contain confidential and/or privileged information. If you are 
not the intended recipient (or have received this e-mail in error) please 
notify the sender immediately and destroy this e-mail. Any unauthorized 
copying, disclosure or distribution of the material in this e-mail is strictly 
forbidden.



Am 22.07.2010 15:18, schrieb Oliveiros d'Azevedo Cristina:
> Howdy, Rainer.
>
> Please advice me,
>
> The dates always follow that sequential pattern?
>
> Or can be holes on the dates sequence?
>
> Best,
> Oliveiros
>
> - Original Message - From: "Rainer Stengele" 
> 
> To: 
> Sent: Thursday, July 22, 2010 9:09 AM
> Subject: [SQL] grouping subsets
>
>
>> Hi,
>>
>> having a table similar to
>>
>> | 1 | B | [2010-07-15 Do] |
>> | 1 | B | [2010-07-16 Fr] |
>> |---+---+-|
>> | 2 | C | [2010-07-17 Sa] |
>> | 2 | C | [2010-07-18 So] |
>> |---+---+-|
>> | 1 | B | [2010-07-19 Mo] |
>> | 1 | B | [2010-07-20 Di] |
>> | 1 | B | [2010-07-21 Mi] |
>> | 1 | B | [2010-07-22 Do] |
>> |---+---+-|
>> | 3 | D | [2010-07-23 Fr] |
>>
>> a simple group by gives me:
>>
>> | 6 | B |
>> | 4 | C |
>> | 3 | D |
>>
>>
>> What I want to get is the values grouped by "subset", where a subset is a 
>> set of rows with identical column until the colum changes.
>> Is there a way to get
>>
>> | 2 | B |
>> | 4 | C |
>> | 4 | B |
>> | 3 | D |
>>
>> by SQL only?
>>
>> - Rainer
>>
>>
>>
>>
>>
>> -- 
>> Sent via pgsql-sql mailing list ([email protected])
>> To make changes to your subscription:
>> http://www.postgresql.org/mailpref/pgsql-sql 
>
>

-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] grouping subsets

2010-07-22 Thread Tim Landscheidt
Rainer Stengele  wrote:

> yes, the date is always incremented - but anyway the date
> column is not really the point! Actually the first tow
> columns are relevant. I want them gouped together as
> indicated, adding up column 1 in the blocks with identical
> second column, but not adding up over all the rows.
> [...]

If the date column wasn't relevant, how would you group the
first two columns?

Tim


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Aggregates (last/first) not behaving

2010-07-22 Thread Wes Devauld
I believe I lost the flavour of what I'm doing when I constructed this
example.  I'm not interested in the timepoint as much as the value that is
attached to it.  I need to be able to find the last chronological record for
a given day.

I can get the value for which I am looking in two steps:

select max(t) as t into table last_of_day from test group by extract(day
from t);
select last_of_day.t, test.v from last_of_day, test where last_of_day.t =
test.t;

I was fairly happy when first() and last() were discovered, as these two
steps could be merged, until the table grows too large and the query planner
decides to sort the results before they are aggregated.

I was searching for a way to keep using last() and keeping the extraction to
a single step, although the more I fight with it, the less I think that it
is worth it.  If you have any further suggestions, I would appreciate
hearing them.

-W

On Thu, Jul 22, 2010 at 2:44 AM, Richard Huxton  wrote:
>
>
> They all seem grouped properly (by day) to me. Unless I've missed
> something.
>
> The first/last aggregates aren't ordered in any way. They are "first value
> I happened to find" and "last value I happened to find".
>
> If you want the earliest/latest timestamp from each day, use min() and
> max().
>
> --
>  Richard Huxton
>  Archonet Ltd
>


Re: [SQL] Aggregates (last/first) not behaving

2010-07-22 Thread Tim Landscheidt
Wes Devauld  wrote:

> I believe I lost the flavour of what I'm doing when I constructed this
> example.  I'm not interested in the timepoint as much as the value that is
> attached to it.  I need to be able to find the last chronological record for
> a given day.

> I can get the value for which I am looking in two steps:

> select max(t) as t into table last_of_day from test group by extract(day
> from t);
> select last_of_day.t, test.v from last_of_day, test where last_of_day.t =
> test.t;

> I was fairly happy when first() and last() were discovered, as these two
> steps could be merged, until the table grows too large and the query planner
> decides to sort the results before they are aggregated.

> I was searching for a way to keep using last() and keeping the extraction to
> a single step, although the more I fight with it, the less I think that it
> is worth it.  If you have any further suggestions, I would appreciate
> hearing them.
> [...]

Perhaps you could /concisely/ rephrase your problem. Finding
the first/last value per group with/without window functions
is a common problem, and there are lots of solutions to it.
But few people will wade through lots of text to find out
what's bothering you.

  For example, you can query the "last" values per day along
the lines of (untested):

| SELECT EXTRACT(day FROM t), v
|   FROM test
|   WHERE t IN (SELECT MAX(t) FROM test
| GROUP BY EXTRACT(day FROM t));

Obviously, this doesn't "keep using last()", so I don't know
whether it's good or bad for you.

Tim


-- 
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql


Re: [SQL] Aggregates (last/first) not behaving

2010-07-22 Thread Richard Huxton

On 22/07/10 16:50, Wes Devauld wrote:


I was searching for a way to keep using last() and keeping the extraction to
a single step, although the more I fight with it, the less I think that it
is worth it.  If you have any further suggestions, I would appreciate
hearing them.


You can certainly do it in a single query. I've commented out the 
event_date_idx below because it's far from guaranteed it'll be useful to 
you.


BEGIN;

DROP TABLE IF EXISTS events;

CREATE TABLE events (
e_id   SERIAL,
e_ts   timestamp(0) without time zone,
PRIMARY KEY (e_id)
);

INSERT INTO events (e_ts)
SELECT '2010-01-01 01:01:01'::timestamp without time zone
+ i * '1 minute'::interval
FROM generate_series(0,99) i;

-- CREATE INDEX event_date_idx ON events ((e_ts::date),e_ts);
CREATE INDEX events_ts_idx ON events (e_ts);

-- EXPLAIN ANALYSE
SELECT
e.e_id,
e.e_ts,
minmax.tgt_day
FROM (
SELECT
(e_ts::date) AS tgt_day,
 min(e_ts) as first_ts,
 max(e_ts) as last_ts
FROM
events
GROUP BY 1
) AS minmax
JOIN events e
ON (e.e_ts = minmax.first_ts) OR (e.e_ts = minmax.last_ts)
ORDER BY e_ts
;

COMMIT;

Using the real, windowing versions of first/last in 8.4+ will still 
require sorting the whole table (AFAICT) so isn't likely to be much 
improvement over a self-join here.


--
  Richard Huxton
  Archonet Ltd

--
Sent via pgsql-sql mailing list ([email protected])
To make changes to your subscription:
http://www.postgresql.org/mailpref/pgsql-sql