I have a SQL problem I’m trying to solve and realise its now gone waaaayyy 
beyond my meagre SQL talents.

A bit of context

The dataset I’m looking at looks at traffic problems. Every five mins I get a 
new dataset of traffic updates. Some of the new dataset is a continuation of 
the old data, e.g. there may be an ongoing collision, some of the data is new, 
e.g. a new collision has just happened and some of the data is saying the 
collision is now over. This can be likened to a simple state table. There is no 
state of New, just the fact that something is now active and has appeared.

The simplified data table (Table_1) where this is recorded looks like this. 
I’ve split the Time Events to highlight each file import.

Disruption_id | Time_Event | State

1             | 1          | Active
2             | 1          | Active
3             | 1          | Active

1             | 2          | Active
3             | 2          | Closed
4             | 2          | Active

1             | 3          | Active
2             | 3          | Active
3             | 3          | Closed
4             | 3          | Active

1             | 4          | Closed
2             | 4          | Active
3             | 4          | Closed
4             | 4          | Closed
5             | 4          | Active

1             | 5          | Closed
2             | 5          | Closed
3             | 5          | Closed
4             | 5          | Closed
5             | 5          | Active

1. Disruption_Id is always correct, a Disruption_Id only appears once in any 
file import and is never reused. 

2. The Time Event is the file in which the data comes in. In the above example 
there are five input files spread across 25 mins, I have simplified the 
Time_Event to be a sequential set of numbers.

3. The omission of the Disruption_id=2 in Time_Event=2 is deliberate as the 
data is not always complete.

4. A disruption_id may continue to appear as Closed in subsequent file imports, 
e.g. DisruptionId=3. I am only really interested in the first Close.

What I’m trying to do is to find out how long each disruption occurred for.

I’d like to produce an output table based on the above data set that looks a 
bit like this.

Disruption_id | Start Time | End Time
1             |  1         |   4
2             |  1         |   5
3             |  1         |   2
4             |  2         |   4

1. Disruption_id would only appear in one row.

2. Any events that are still Active, e.g. Disruption_Id=5 in the above example 
would not be shown.

3. The first Closed event is the one that is chosen as the end_time

This would tell me that Disruption_id = 1 ran between Time_Event 1 and 4, 
Disruption_id = 3 ran between Time_Event 1 and 2.

Even I can easily create a simple piece of SQL to find the Max or Min of each 
disruption_id for a specific id, e.g. select max(Time_Event) from Table_1 where 
Disruption_id = 1.

I struggle when I try to pull it all together so that I have one piece of SQL 
that does all the work. I’ve tried searching for this on Google and in this 
mail group, but I can’t even describe what I want to do without giving an 
example which does limit my searches down. I’ve spent a day looking and writing 
junk SQL code and come to the conclusion I haven’t a clue.

Any suggestions gratefully welcomed along with pointers to where to look. Is 
this even possible? 

I’m happy to sit down and work through some examples but I can’t even find them 
to learn from.

Thanks for taking the time to read this far.

Rob.
_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to