Pavel, Thanks for the response. I have not yet had the opportunity to use cursors, but am now curious. Could you perhaps provide a bit more detail as to what the implementation of your suggested approach would look like?
On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule <pavel.steh...@gmail.com>wrote: > Hello > > you can try procedural solution - use a cursor over ordered data in > plpgsql and returns table > > Regards > > Pavel Stehule > > 2013/3/11 Jeff Adams - NOAA Affiliate <jeff.ad...@noaa.gov>: > > Greetings, > > > > > > > > I have a large table (~90 million rows) containing vessel positions. In > > addition to a column that contains the location information (the_geom), > the > > table also contains two columns that are used to uniquely identify the > > vessel (mmsi and name) and a column containing the Unix time (epoch) at > > which the position information was logged. I frequently need to assign > > records to vessel transits. To do this, I currently create a CTE that > uses a > > Window function (partitioning the data by mmsi and name ordered by > epoch) to > > examine the time that has elapsed between successive position reports for > > individual vessels. For every position record for a vessel (as identified > > using mmsi and name), if the time elapsed between the current position > > record and the previous record (using the lag function) is less than or > > equal to 2 hours, I assign the record a value of 0 to a CTE column named > > tr_index. If the time elapsed is greater than 2 hours, I assign the > record a > > value of 1 to the tr_index column. I then use the CTE to generate transit > > numbers by summing the values in the tr_index field across a Window that > > also partitions the data by mmsi and name and is ordered by epoch. This > > works, but is very slow (hours). The table is indexed (multi-column > index on > > mmsi, name and index on epoch). Does anyone see a way to get what I am > after > > in a more efficient manner. What I am after is an assignment of transit > > number to vessels' position records based on whether the records were > within > > two hours of each other. The SQL that I used is provided below. Any > advice > > would be greatly appreciated... > > > > > > > > WITH > > > > cte_01 AS > > > > ( > > > > SELECT > > > > a.id, > > > > a.mmsi, > > > > a.name, > > > > a.epoch, > > > > a.the_geom > > > > CASE > > > > WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1 > > > > ELSE 0 > > > > END AS tr_index > > > > FROM table a > > > > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) > > > > ) > > > > > > > > > > > > SELECT > > > > a.id, > > > > a.mmsi, > > > > a.name, > > > > a.epoch, > > > > a.the_geom, > > > > 1 + sum(a.tr_index) OVER w AS transit, > > > > a.active > > > > FROM cte_01 a > > > > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) > > > > > > > > -- > > Jeff > -- Jeffrey D. Adams Contractor OAI, Inc. In support of: National Marine Fisheries Service Office of Protected Resources 1315 East West Hwy, Building SSMC3 Silver Spring, MD 20910-3282 phone: (301) 427-8434 fax: (301) 713-0376