Re: [SQL] grouping subsets
the table may include up to maybe 30 entries per day, average maybe 10-15 After a year this makes about 10.000 entries - maximum, average about 5000 entries. For the problem described I have to use a Microsoft SQL database and would like to use pure SQL. As I use postgres on my Linux servers I found this newsgroup and thought I ask here. Thanks! Rainer Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina: > Fine. > > Please advice me, > > How long can > your table be? Thousands? Millions of records? > > Do you really need it in pure SQL > ? > > It seems to me that it might be possible, I'm just affraid that the query > would become too complex and thus slow... > > Best, > Oliveiros > > - Original Message - From: "Rainer Stengele" > > To: "Oliveiros d'Azevedo Cristina" > Cc: > Sent: Thursday, July 29, 2010 1:10 PM > Subject: Re: grouping subsets > > >> No. This is by accident. >> We have to assume that the combinations do change anytime, and many times >> per day. >> >> So >> >> "Or is it possible to have the same combination on one day with several >> sets?" >> >> YES! >> >> Rainer >> >> >> >> Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina: >>> Yes. This is somewhat more complicated because it has more constraints. >>> I've noticed that a given combination doesn't appear with holes on a >>> certain day. >>> >>> For ex, on a daily basis, we have every three key combinations together. >>> >>> We dont have things like >>> 2010-7-01 1726 3212 1428 >>> 2010-7-01 1726 3212 1428 >>> ...318 1846 1012 >>> 2010-7-01 1726 3212 1428 >>> >>> Can I assume that, for a certain day , the records for the same three >>> combination are all together? There is just one set per day for a given >>> combination? >>> >>> Or is it possible to have the same combination on one day with several sets? >>> >>> Best, >>> Oliveiros >>> >>> >>> - Original Message - From: "Rainer Stengele" >>> >>> Newsgroups: gmane.comp.db.postgresql.sql >>> To: "Oliveiros d'Azevedo Cristina" >>> Cc: <> >>> Sent: Thursday, July 29, 2010 10:41 AM >>> Subject: Re: grouping subsets >>> >>> Howdy Cristina, unfortunately things are more complicated. I have inserted an excerpt of the real data here: TableID MasterID dtBegin dtEnd idR idL idB consumption 4057312 295530 2010-07-01 00:59:21.077 2010-07-01 01:32:59.670 1726 3212 1428 279 4061043 295574 2010-07-01 01:59:31.137 2010-07-01 02:32:09.373 1726 3212 1428 183 4083397 295838 2010-07-01 07:57:51.327 2010-07-01 08:28:28.117 318 1846 1012 30 4090858 295920 2010-07-01 09:52:33.777 2010-07-01 10:31:34.393 318 1846 1012 487 4094589 295961 2010-07-01 10:47:59.370 2010-07-01 11:32:20.903 318 1846 1012 472 4098330 296013 2010-07-01 11:58:53.890 2010-07-01 12:31:35.730 318 1846 1012 195 4102069 296058 2010-07-01 12:36:19.170 2010-07-01 13:32:13.950 318 1846 1012 338 4105809 296102 2010-07-01 13:58:53.170 2010-07-01 14:02:57.710 318 1846 1012 105 4109555 296150 2010-07-01 14:59:11.663 2010-07-01 15:32:33.810 318 1846 1012 187 4113305 296194 2010-07-01 15:59:01.797 2010-07-01 16:02:27.260 318 1846 1012 108 4117048 296238 2010-07-01 16:20:47.997 2010-07-01 17:32:49.367 318 1846 1012 179 4120791 296282 2010-07-01 17:58:27.657 2010-07-01 18:29:01.733 318 1846 1012 256 4128291 296370 2010-07-01 19:54:17.687 2010-07-01 20:32:53.850 318 1846 1012 239 4132044 296413 2010-07-01 20:31:37.653 2010-07-01 21:29:13.497 318 1846 1012 39 4135797 296458 2010-07-01 21:59:13.983 2010-07-01 22:32:46.503 318 1846 1012 157 4139572 296506 2010-07-01 22:58:49.530 2010-07-01 23:32:22.543 318 1846 1012 218 4142941 296554 2010-07-01 23:59:13.857 2010-07-02 00:32:30.390 318 1846 1012 248 4146289 296598 2010-07-02 00:58:55.763 2010-07-02 01:32:41.983 318 1846 1012 204 4149616 296642 2010-07-02 01:46:57.357 2010-07-02 02:32:56
Re: [SQL] grouping subsets
I See. And the analysis you need to do, the sum of the rows with the same keys (until they change) will have to be done over all table? Or just over some predefined interval ? Best, Oliveiros - Original Message - From: "Rainer Stengele" Newsgroups: gmane.comp.db.postgresql.sql To: "Oliveiros d'Azevedo Cristina" Cc: Sent: Friday, July 30, 2010 10:35 AM Subject: Re: grouping subsets the table may include up to maybe 30 entries per day, average maybe 10-15 After a year this makes about 10.000 entries - maximum, average about 5000 entries. For the problem described I have to use a Microsoft SQL database and would like to use pure SQL. As I use postgres on my Linux servers I found this newsgroup and thought I ask here. Thanks! Rainer Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina: Fine. Please advice me, How long can your table be? Thousands? Millions of records? Do you really need it in pure SQL ? It seems to me that it might be possible, I'm just affraid that the query would become too complex and thus slow... Best, Oliveiros - Original Message - From: "Rainer Stengele" To: "Oliveiros d'Azevedo Cristina" Cc: Sent: Thursday, July 29, 2010 1:10 PM Subject: Re: grouping subsets No. This is by accident. We have to assume that the combinations do change anytime, and many times per day. So "Or is it possible to have the same combination on one day with several sets?" YES! Rainer Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina: Yes. This is somewhat more complicated because it has more constraints. I've noticed that a given combination doesn't appear with holes on a certain day. For ex, on a daily basis, we have every three key combinations together. We dont have things like 2010-7-01 1726 3212 1428 2010-7-01 1726 3212 1428 ...318 1846 1012 2010-7-01 1726 3212 1428 Can I assume that, for a certain day , the records for the same three combination are all together? There is just one set per day for a given combination? Or is it possible to have the same combination on one day with several sets? Best, Oliveiros - Original Message - From: "Rainer Stengele" Newsgroups: gmane.comp.db.postgresql.sql To: "Oliveiros d'Azevedo Cristina" Cc: <> Sent: Thursday, July 29, 2010 10:41 AM Subject: Re: grouping subsets Howdy Cristina, unfortunately things are more complicated. I have inserted an excerpt of the real data here: TableID MasterID dtBegin dtEnd idR idL idB consumption 4057312 295530 2010-07-01 00:59:21.077 2010-07-01 01:32:59.670 1726 3212 1428 279 4061043 295574 2010-07-01 01:59:31.137 2010-07-01 02:32:09.373 1726 3212 1428 183 4083397 295838 2010-07-01 07:57:51.327 2010-07-01 08:28:28.117 318 1846 1012 30 4090858 295920 2010-07-01 09:52:33.777 2010-07-01 10:31:34.393 318 1846 1012 487 4094589 295961 2010-07-01 10:47:59.370 2010-07-01 11:32:20.903 318 1846 1012 472 4098330 296013 2010-07-01 11:58:53.890 2010-07-01 12:31:35.730 318 1846 1012 195 4102069 296058 2010-07-01 12:36:19.170 2010-07-01 13:32:13.950 318 1846 1012 338 4105809 296102 2010-07-01 13:58:53.170 2010-07-01 14:02:57.710 318 1846 1012 105 4109555 296150 2010-07-01 14:59:11.663 2010-07-01 15:32:33.810 318 1846 1012 187 4113305 296194 2010-07-01 15:59:01.797 2010-07-01 16:02:27.260 318 1846 1012 108 4117048 296238 2010-07-01 16:20:47.997 2010-07-01 17:32:49.367 318 1846 1012 179 4120791 296282 2010-07-01 17:58:27.657 2010-07-01 18:29:01.733 318 1846 1012 256 4128291 296370 2010-07-01 19:54:17.687 2010-07-01 20:32:53.850 318 1846 1012 239 4132044 296413 2010-07-01 20:31:37.653 2010-07-01 21:29:13.497 318 1846 1012 39 4135797 296458 2010-07-01 21:59:13.983 2010-07-01 22:32:46.503 318 1846 1012 157 4139572 296506 2010-07-01 22:58:49.530 2010-07-01 23:32:22.543 318 1846 1012 218 4142941 296554 2010-07-01 23:59:13.857 2010-07-02 00:32:30.390 318 1846 1012 248 4146289 296598 2010-07-02 00:58:55.763 2010-07-02 01:32:41.983 318 1846 1012 204 4149616 296642 2010-07-02 01:46:57.357 2010-07-02 02:32:56.983
Re: [SQL] grouping subsets
On Thu, Jul 22, 2010 at 11:31:23AM +, Tim Landscheidt wrote:
> Richard Huxton wrote:
>
> >>> What I want to get is the values grouped by "subset", where a subset is a
> >>> set of rows with identical column until the colum changes.
> >>> Is there a way to get
>
> >>> | 2 | B |
> >>> | 4 | C |
> >>> | 4 | B |
> >>> | 3 | D |
>
> >>> by SQL only?
>
> >> I think, the problem is that there are 2 identical groups. I think, you
> >> can write a pl/pgsql-proc, selecting all ordered by the date-field and
> >> walking through the result to do the grouping, checking if the 2nd
> >> column is different from the previous.
>
> >> With plain SQL it's maybe possible too, but i don't know how ...
>
> > It should be do-able in 8.4 onwards, look into windowing
> > functions. In particular the lag() function:
>
> > SELECT
> > mycode,
> > mydate,
> > lag(mycode) OVER (ORDER BY mydate) AS prev_code
> > FROM
> > mytable
> > ORDER BY mydate;
>
> > It should be possible to use that as a subquery with an
> > outer query that compares mycode=prev_code to get a run
> > length.
>
> Hmmm. Can the outer query be done without using "WITH
> RECURSIVE"?
How about this:
select
a, b, c, d, sum
from (
select
a, b, c, d, new_partition,
sum(e) over (partition by partition_num)
from (
select
a, b, c, d, e,
case when
lag(a, 1, null) over (order by d) is null or
lag(a, 1, null) over (order by d) != a or
lag(b, 1, null) over (order by d) is null or
lag(b, 1, null) over (order by d) != b or
lag(c, 1, null) over (order by d) is null or
lag(c, 1, null) over (order by d) != c
then nextval('a')
else currval('a')
end as partition_num,
case when
lag(a, 1, null) over (order by d) is null or
lag(a, 1, null) over (order by d) != a or
lag(b, 1, null) over (order by d) is null or
lag(b, 1, null) over (order by d) != b or
lag(c, 1, null) over (order by d) is null or
lag(c, 1, null) over (order by d) != c
then 'T'::boolean
else 'f'::boolean
end as new_partition
from foo
) bar
) baz
where
new_partition;
Here's my test table:
5432 j...@postgres# select * from foo;
a | b | c | d| e
---+---+---++---
1 | 9 | 1 | 2007-01-01 00:00:05-07 | 8
9 | 2 | 1 | 2007-01-01 00:00:10-07 | 4
9 | 2 | 1 | 2007-01-01 00:00:15-07 | 6
9 | 2 | 1 | 2007-01-01 00:00:20-07 | 2
6 | 5 | 7 | 2007-01-01 00:00:25-07 | 3
4 | 9 | 0 | 2007-01-01 00:00:30-07 | 0
4 | 9 | 0 | 2007-01-01 00:00:35-07 | 7
5 | 2 | 7 | 2007-01-01 00:01:25-07 | 7
5 | 2 | 7 | 2007-01-01 00:01:30-07 | 7
5 | 2 | 7 | 2007-01-01 00:01:35-07 | 9
5 | 2 | 7 | 2007-01-01 00:01:40-07 | 2
5 | 2 | 7 | 2007-01-01 00:01:45-07 | 5
5 | 2 | 7 | 2007-01-01 00:01:50-07 | 8
5 | 2 | 7 | 2007-01-01 00:01:55-07 | 5
5 | 2 | 7 | 2007-01-01 00:02:00-07 | 9
7 | 8 | 8 | 2007-01-01 00:02:05-07 | 7
7 | 8 | 8 | 2007-01-01 00:02:10-07 | 8
9 | 3 | 0 | 2007-01-01 00:02:15-07 | 0
9 | 3 | 0 | 2007-01-01 00:02:20-07 | 8
9 | 2 | 1 | 2007-01-01 00:02:25-07 | 3
9 | 2 | 1 | 2007-01-01 00:02:30-07 | 3
(21 rows)
...and these results...
a | b | c | d| sum
---+---+---++-
1 | 9 | 1 | 2007-01-01 00:00:05-07 | 8
9 | 2 | 1 | 2007-01-01 00:00:10-07 | 12
6 | 5 | 7 | 2007-01-01 00:00:25-07 | 3
4 | 9 | 0 | 2007-01-01 00:00:30-07 | 7
5 | 2 | 7 | 2007-01-01 00:01:25-07 | 52
7 | 8 | 8 | 2007-01-01 00:02:05-07 | 15
9 | 3 | 0 | 2007-01-01 00:02:15-07 | 8
9 | 2 | 1 | 2007-01-01 00:02:25-07 | 6
(8 rows)
--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com
signature.asc
Description: Digital signature
