Re: [SQL] grouping subsets

2010-07-30 Thread Rainer Stengele
the table may include up to maybe 30 entries per day, average maybe 10-15
After a year this makes about 10.000 entries - maximum, average about 5000 
entries.

For the problem described I have to use a Microsoft SQL database and would like 
to use pure SQL.
As I use postgres on my Linux servers I found this newsgroup and thought I ask 
here.

Thanks!
Rainer

Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina:
> Fine.
> 
> Please advice me,
> 
> How long can
> your table be? Thousands? Millions of records?
> 
> Do you really need it in pure SQL
> ?
> 
> It seems to me that it might be possible, I'm just affraid that the query 
> would become too complex and thus slow...
> 
> Best,
> Oliveiros
> 
> - Original Message - From: "Rainer Stengele" 
> 
> To: "Oliveiros d'Azevedo Cristina" 
> Cc: 
> Sent: Thursday, July 29, 2010 1:10 PM
> Subject: Re: grouping subsets
> 
> 
>> No. This is by accident.
>> We have to assume that the combinations do change anytime, and many times 
>> per day.
>>
>> So
>>
>> "Or is it possible to have the same combination on one day with several 
>> sets?"
>>
>> YES!
>>
>> Rainer
>>
>>
>>
>> Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina:
>>> Yes. This is somewhat more complicated because it has more constraints.
>>> I've noticed that a given combination doesn't appear with holes on a 
>>> certain day.
>>>
>>> For ex, on a daily basis, we have every three key combinations together.
>>>
>>> We dont have things like
>>> 2010-7-01  1726 3212 1428
>>> 2010-7-01  1726 3212 1428
>>> ...318   1846 1012
>>> 2010-7-01  1726 3212 1428
>>>
>>> Can I assume that, for a certain day , the records for the same three 
>>> combination are all together? There is just one set per day for a given 
>>> combination?
>>>
>>> Or is it possible to have the same combination on one day with several sets?
>>>
>>> Best,
>>> Oliveiros
>>>
>>>
>>> - Original Message - From: "Rainer Stengele" 
>>> 
>>> Newsgroups: gmane.comp.db.postgresql.sql
>>> To: "Oliveiros d'Azevedo Cristina" 
>>> Cc: <>
>>> Sent: Thursday, July 29, 2010 10:41 AM
>>> Subject: Re: grouping subsets
>>>
>>>
 Howdy Cristina,

 unfortunately things are more complicated. I have inserted an excerpt of 
 the real data here:

 
 TableID MasterID dtBegin   dtEnd idR idL   
idB consumption
 4057312   295530 2010-07-01 00:59:21.077   2010-07-01 
 01:32:59.670   1726 3212 1428 279
 4061043   295574 2010-07-01 01:59:31.137   2010-07-01 
 02:32:09.373   1726 3212 1428 183
 4083397   295838 2010-07-01 07:57:51.327   2010-07-01 
 08:28:28.117   318  1846 1012 30
 4090858   295920 2010-07-01 09:52:33.777   2010-07-01 
 10:31:34.393   318  1846 1012 487
 4094589   295961 2010-07-01 10:47:59.370   2010-07-01 
 11:32:20.903   318  1846 1012 472
 4098330   296013 2010-07-01 11:58:53.890   2010-07-01 
 12:31:35.730   318  1846 1012 195
 4102069   296058 2010-07-01 12:36:19.170   2010-07-01 
 13:32:13.950   318  1846 1012 338
 4105809   296102 2010-07-01 13:58:53.170   2010-07-01 
 14:02:57.710   318  1846 1012 105
 4109555   296150 2010-07-01 14:59:11.663   2010-07-01 
 15:32:33.810   318  1846 1012 187
 4113305   296194 2010-07-01 15:59:01.797   2010-07-01 
 16:02:27.260   318  1846 1012 108
 4117048   296238 2010-07-01 16:20:47.997   2010-07-01 
 17:32:49.367   318  1846 1012 179
 4120791   296282 2010-07-01 17:58:27.657   2010-07-01 
 18:29:01.733   318  1846 1012 256
 4128291   296370 2010-07-01 19:54:17.687   2010-07-01 
 20:32:53.850   318  1846 1012 239
 4132044   296413 2010-07-01 20:31:37.653   2010-07-01 
 21:29:13.497   318  1846 1012 39
 4135797   296458 2010-07-01 21:59:13.983   2010-07-01 
 22:32:46.503   318  1846 1012 157
 4139572   296506 2010-07-01 22:58:49.530   2010-07-01 
 23:32:22.543   318  1846 1012 218
 4142941   296554 2010-07-01 23:59:13.857   2010-07-02 
 00:32:30.390   318  1846 1012 248
 4146289   296598 2010-07-02 00:58:55.763   2010-07-02 
 01:32:41.983   318  1846 1012 204
 4149616   296642 2010-07-02 01:46:57.357   2010-07-02 
 02:32:56

Re: [SQL] grouping subsets

2010-07-30 Thread Oliveiros d'Azevedo Cristina

I See.

And the analysis you need to do, the sum of the rows with the same keys 
(until they change) will have to be done over all

table?
Or just over some predefined interval ?

Best,
Oliveiros
- Original Message - 
From: "Rainer Stengele" 

Newsgroups: gmane.comp.db.postgresql.sql
To: "Oliveiros d'Azevedo Cristina" 
Cc: 
Sent: Friday, July 30, 2010 10:35 AM
Subject: Re: grouping subsets



the table may include up to maybe 30 entries per day, average maybe 10-15
After a year this makes about 10.000 entries - maximum, average about 5000 
entries.


For the problem described I have to use a Microsoft SQL database and would 
like to use pure SQL.
As I use postgres on my Linux servers I found this newsgroup and thought I 
ask here.


Thanks!
Rainer

Am 29.07.2010 15:31, schrieb Oliveiros d'Azevedo Cristina:

Fine.

Please advice me,

How long can
your table be? Thousands? Millions of records?

Do you really need it in pure SQL
?

It seems to me that it might be possible, I'm just affraid that the query 
would become too complex and thus slow...


Best,
Oliveiros

- Original Message - From: "Rainer Stengele" 


To: "Oliveiros d'Azevedo Cristina" 
Cc: 
Sent: Thursday, July 29, 2010 1:10 PM
Subject: Re: grouping subsets



No. This is by accident.
We have to assume that the combinations do change anytime, and many 
times per day.


So

"Or is it possible to have the same combination on one day with several 
sets?"


YES!

Rainer



Am 29.07.2010 13:47, schrieb Oliveiros d'Azevedo Cristina:

Yes. This is somewhat more complicated because it has more constraints.
I've noticed that a given combination doesn't appear with holes on a 
certain day.


For ex, on a daily basis, we have every three key combinations 
together.


We dont have things like
2010-7-01  1726 3212 1428
2010-7-01  1726 3212 1428
...318   1846 1012
2010-7-01  1726 3212 1428

Can I assume that, for a certain day , the records for the same three 
combination are all together? There is just one set per day for a given 
combination?


Or is it possible to have the same combination on one day with several 
sets?


Best,
Oliveiros


- Original Message - From: "Rainer Stengele" 


Newsgroups: gmane.comp.db.postgresql.sql
To: "Oliveiros d'Azevedo Cristina" 
Cc: <>
Sent: Thursday, July 29, 2010 10:41 AM
Subject: Re: grouping subsets



Howdy Cristina,

unfortunately things are more complicated. I have inserted an excerpt 
of the real data here:



TableID MasterID dtBegin   dtEnd idR 
idL  idB consumption
4057312   295530 2010-07-01 00:59:21.077   2010-07-01 
01:32:59.670   1726 3212 1428 279
4061043   295574 2010-07-01 01:59:31.137   2010-07-01 
02:32:09.373   1726 3212 1428 183
4083397   295838 2010-07-01 07:57:51.327   2010-07-01 
08:28:28.117   318  1846 1012 30
4090858   295920 2010-07-01 09:52:33.777   2010-07-01 
10:31:34.393   318  1846 1012 487
4094589   295961 2010-07-01 10:47:59.370   2010-07-01 
11:32:20.903   318  1846 1012 472
4098330   296013 2010-07-01 11:58:53.890   2010-07-01 
12:31:35.730   318  1846 1012 195
4102069   296058 2010-07-01 12:36:19.170   2010-07-01 
13:32:13.950   318  1846 1012 338
4105809   296102 2010-07-01 13:58:53.170   2010-07-01 
14:02:57.710   318  1846 1012 105
4109555   296150 2010-07-01 14:59:11.663   2010-07-01 
15:32:33.810   318  1846 1012 187
4113305   296194 2010-07-01 15:59:01.797   2010-07-01 
16:02:27.260   318  1846 1012 108
4117048   296238 2010-07-01 16:20:47.997   2010-07-01 
17:32:49.367   318  1846 1012 179
4120791   296282 2010-07-01 17:58:27.657   2010-07-01 
18:29:01.733   318  1846 1012 256
4128291   296370 2010-07-01 19:54:17.687   2010-07-01 
20:32:53.850   318  1846 1012 239
4132044   296413 2010-07-01 20:31:37.653   2010-07-01 
21:29:13.497   318  1846 1012 39
4135797   296458 2010-07-01 21:59:13.983   2010-07-01 
22:32:46.503   318  1846 1012 157
4139572   296506 2010-07-01 22:58:49.530   2010-07-01 
23:32:22.543   318  1846 1012 218
4142941   296554 2010-07-01 23:59:13.857   2010-07-02 
00:32:30.390   318  1846 1012 248
4146289   296598 2010-07-02 00:58:55.763   2010-07-02 
01:32:41.983   318  1846 1012 204
4149616   296642 2010-07-02 01:46:57.357   2010-07-02 
02:32:56.983 

Re: [SQL] grouping subsets

2010-07-30 Thread Joshua Tolley
On Thu, Jul 22, 2010 at 11:31:23AM +, Tim Landscheidt wrote:
> Richard Huxton  wrote:
> 
> >>> What I want to get is the values grouped by "subset", where a subset is a 
> >>> set of rows with identical column until the colum changes.
> >>> Is there a way to get
> 
> >>> | 2 | B |
> >>> | 4 | C |
> >>> | 4 | B |
> >>> | 3 | D |
> 
> >>> by SQL only?
> 
> >> I think, the problem is that there are 2 identical groups. I think, you
> >> can write a pl/pgsql-proc, selecting all ordered by the date-field and
> >> walking through the result to do the grouping, checking if the 2nd
> >> column is different from the previous.
> 
> >> With plain SQL it's maybe possible too, but i don't know how ...
> 
> > It should be do-able in 8.4 onwards, look into windowing
> > functions. In particular the lag() function:
> 
> > SELECT
> > mycode,
> > mydate,
> > lag(mycode) OVER (ORDER BY mydate) AS prev_code
> > FROM
> > mytable
> > ORDER BY mydate;
> 
> > It should be possible to use that as a subquery with an
> > outer query that compares mycode=prev_code to get a run
> > length.
> 
> Hmmm. Can the outer query be done without using "WITH
> RECURSIVE"?

How about this:

select
a, b, c, d, sum
from (
select
a, b, c, d, new_partition,
sum(e) over (partition by partition_num) 
from (
select
a, b, c, d, e,
case when
lag(a, 1, null) over (order by d) is null or
lag(a, 1, null) over (order by d) != a or
lag(b, 1, null) over (order by d) is null or
lag(b, 1, null) over (order by d) != b or
lag(c, 1, null) over (order by d) is null or
lag(c, 1, null) over (order by d) != c
then nextval('a')
else currval('a')
end as partition_num,
case when
lag(a, 1, null) over (order by d) is null or
lag(a, 1, null) over (order by d) != a or
lag(b, 1, null) over (order by d) is null or
lag(b, 1, null) over (order by d) != b or
lag(c, 1, null) over (order by d) is null or
lag(c, 1, null) over (order by d) != c
then 'T'::boolean
else 'f'::boolean
end as new_partition
from foo
) bar
) baz
where
new_partition;

Here's my test table:
5432 j...@postgres# select * from foo;
 a | b | c |   d| e 
---+---+---++---
 1 | 9 | 1 | 2007-01-01 00:00:05-07 | 8
 9 | 2 | 1 | 2007-01-01 00:00:10-07 | 4
 9 | 2 | 1 | 2007-01-01 00:00:15-07 | 6
 9 | 2 | 1 | 2007-01-01 00:00:20-07 | 2
 6 | 5 | 7 | 2007-01-01 00:00:25-07 | 3
 4 | 9 | 0 | 2007-01-01 00:00:30-07 | 0
 4 | 9 | 0 | 2007-01-01 00:00:35-07 | 7
 5 | 2 | 7 | 2007-01-01 00:01:25-07 | 7
 5 | 2 | 7 | 2007-01-01 00:01:30-07 | 7
 5 | 2 | 7 | 2007-01-01 00:01:35-07 | 9
 5 | 2 | 7 | 2007-01-01 00:01:40-07 | 2
 5 | 2 | 7 | 2007-01-01 00:01:45-07 | 5
 5 | 2 | 7 | 2007-01-01 00:01:50-07 | 8
 5 | 2 | 7 | 2007-01-01 00:01:55-07 | 5
 5 | 2 | 7 | 2007-01-01 00:02:00-07 | 9
 7 | 8 | 8 | 2007-01-01 00:02:05-07 | 7
 7 | 8 | 8 | 2007-01-01 00:02:10-07 | 8
 9 | 3 | 0 | 2007-01-01 00:02:15-07 | 0
 9 | 3 | 0 | 2007-01-01 00:02:20-07 | 8
 9 | 2 | 1 | 2007-01-01 00:02:25-07 | 3
 9 | 2 | 1 | 2007-01-01 00:02:30-07 | 3
(21 rows)

...and these results...
 a | b | c |   d| sum 
---+---+---++-
 1 | 9 | 1 | 2007-01-01 00:00:05-07 |   8
 9 | 2 | 1 | 2007-01-01 00:00:10-07 |  12
 6 | 5 | 7 | 2007-01-01 00:00:25-07 |   3
 4 | 9 | 0 | 2007-01-01 00:00:30-07 |   7
 5 | 2 | 7 | 2007-01-01 00:01:25-07 |  52
 7 | 8 | 8 | 2007-01-01 00:02:05-07 |  15
 9 | 3 | 0 | 2007-01-01 00:02:15-07 |   8
 9 | 2 | 1 | 2007-01-01 00:02:25-07 |   6
(8 rows)

--
Joshua Tolley / eggyknap
End Point Corporation
http://www.endpoint.com


signature.asc
Description: Digital signature