Simon, Ivar, Clemens, Aleksey,

Thank you for taking the time to reply to a drowning man. I have been offered 
four lifebelts. I've tested three of them out and they all work slightly 
differently but I'm very happy to work through them and to see where the 
advantages and disadvantages of each of them are. 

As an aside the table it works on currently has 2,090,518 records in it, with 
1,667 unique Disruption_ids.  The time to execute the three different proof of 
concept queries were approx each an order of 0.7 of a magnitude greater than 
each other, one took 700 secs, one took around 93 seconds and one took around 3 
secs.  I cannot say that the faster is the better answer or the slowest is the 
worse as further work is needed to refine the query and it would be churlish, 
unfair and inappropriate to say which was whose as people would jump to the 
wrong conclusion.

I really appreciate the excellent and helpful answers provided. I have improved 
my SQL and learnt new stuff today which is always nice.

Best wishes, 

Rob

On 1 Nov 2014, at 10:19, Rob Willett <rob.sql...@robertwillett.com> wrote:

> I have a SQL problem I’m trying to solve and realise its now gone waaaayyy 
> beyond my meagre SQL talents.
> 
> A bit of context
> 
> The dataset I’m looking at looks at traffic problems. Every five mins I get a 
> new dataset of traffic updates. Some of the new dataset is a continuation of 
> the old data, e.g. there may be an ongoing collision, some of the data is 
> new, e.g. a new collision has just happened and some of the data is saying 
> the collision is now over. This can be likened to a simple state table. There 
> is no state of New, just the fact that something is now active and has 
> appeared.
> 
> The simplified data table (Table_1) where this is recorded looks like this. 
> I’ve split the Time Events to highlight each file import.
> 
> Disruption_id | Time_Event | State
> 
> 1             | 1          | Active
> 2             | 1          | Active
> 3             | 1          | Active
> 
> 1             | 2          | Active
> 3             | 2          | Closed
> 4             | 2          | Active
> 
> 1             | 3          | Active
> 2             | 3          | Active
> 3             | 3          | Closed
> 4             | 3          | Active
> 
> 1             | 4          | Closed
> 2             | 4          | Active
> 3             | 4          | Closed
> 4             | 4          | Closed
> 5             | 4          | Active
> 
> 1             | 5          | Closed
> 2             | 5          | Closed
> 3             | 5          | Closed
> 4             | 5          | Closed
> 5             | 5          | Active
> 
> 1. Disruption_Id is always correct, a Disruption_Id only appears once in any 
> file import and is never reused. 
> 
> 2. The Time Event is the file in which the data comes in. In the above 
> example there are five input files spread across 25 mins, I have simplified 
> the Time_Event to be a sequential set of numbers.
> 
> 3. The omission of the Disruption_id=2 in Time_Event=2 is deliberate as the 
> data is not always complete.
> 
> 4. A disruption_id may continue to appear as Closed in subsequent file 
> imports, e.g. DisruptionId=3. I am only really interested in the first Close.
> 
> What I’m trying to do is to find out how long each disruption occurred for.
> 
> I’d like to produce an output table based on the above data set that looks a 
> bit like this.
> 
> Disruption_id | Start Time | End Time
> 1             |  1         |   4
> 2             |  1         |   5
> 3             |  1         |   2
> 4             |  2         |   4
> 
> 1. Disruption_id would only appear in one row.
> 
> 2. Any events that are still Active, e.g. Disruption_Id=5 in the above 
> example would not be shown.
> 
> 3. The first Closed event is the one that is chosen as the end_time
> 
> This would tell me that Disruption_id = 1 ran between Time_Event 1 and 4, 
> Disruption_id = 3 ran between Time_Event 1 and 2.
> 
> Even I can easily create a simple piece of SQL to find the Max or Min of each 
> disruption_id for a specific id, e.g. select max(Time_Event) from Table_1 
> where Disruption_id = 1.
> 
> I struggle when I try to pull it all together so that I have one piece of SQL 
> that does all the work. I’ve tried searching for this on Google and in this 
> mail group, but I can’t even describe what I want to do without giving an 
> example which does limit my searches down. I’ve spent a day looking and 
> writing junk SQL code and come to the conclusion I haven’t a clue.
> 
> Any suggestions gratefully welcomed along with pointers to where to look. Is 
> this even possible? 
> 
> I’m happy to sit down and work through some examples but I can’t even find 
> them to learn from.
> 
> Thanks for taking the time to read this far.
> 
> Rob.
> _______________________________________________
> sqlite-users mailing list
> sqlite-users@sqlite.org
> http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

_______________________________________________
sqlite-users mailing list
sqlite-users@sqlite.org
http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users

Reply via email to