Thanks again. The sorting does appear to be the issue. I will test out your cursor idea...
On Mon, Mar 11, 2013 at 11:34 AM, Pavel Stehule <pavel.steh...@gmail.com>wrote: > 2013/3/11 Jeff Adams - NOAA Affiliate <jeff.ad...@noaa.gov>: > > Pavel, > > > > Thanks for the response. I have not yet had the opportunity to use > cursors, > > but am now curious. Could you perhaps provide a bit more detail as to > what > > the implementation of your suggested approach would look like? > > an example: > > $$ > DECLARE > r record; > prev_r record; > > BEGIN > FOR r IN SELECT * FROM a ORDER BY epoch, mmsi > LOOP > IF prev_r IS NOT NULL THEN > /* do some counting */ > prev_r contains previous row, r contains current row > do some > RETURN NEXT .. /* return data in defined order */ > END IF; > prev_r = r; > END LOOP; > > > Probably slow part of your query is sorting - first can be accelerated > by index, but second (as CTE result cannot) - you can try increase > work_mem ?? > > Regards > > Pavel > > > > > > > On Mon, Mar 11, 2013 at 11:03 AM, Pavel Stehule <pavel.steh...@gmail.com > > > > wrote: > >> > >> Hello > >> > >> you can try procedural solution - use a cursor over ordered data in > >> plpgsql and returns table > >> > >> Regards > >> > >> Pavel Stehule > >> > >> 2013/3/11 Jeff Adams - NOAA Affiliate <jeff.ad...@noaa.gov>: > >> > Greetings, > >> > > >> > > >> > > >> > I have a large table (~90 million rows) containing vessel positions. > In > >> > addition to a column that contains the location information > (the_geom), > >> > the > >> > table also contains two columns that are used to uniquely identify the > >> > vessel (mmsi and name) and a column containing the Unix time (epoch) > at > >> > which the position information was logged. I frequently need to assign > >> > records to vessel transits. To do this, I currently create a CTE that > >> > uses a > >> > Window function (partitioning the data by mmsi and name ordered by > >> > epoch) to > >> > examine the time that has elapsed between successive position reports > >> > for > >> > individual vessels. For every position record for a vessel (as > >> > identified > >> > using mmsi and name), if the time elapsed between the current position > >> > record and the previous record (using the lag function) is less than > or > >> > equal to 2 hours, I assign the record a value of 0 to a CTE column > named > >> > tr_index. If the time elapsed is greater than 2 hours, I assign the > >> > record a > >> > value of 1 to the tr_index column. I then use the CTE to generate > >> > transit > >> > numbers by summing the values in the tr_index field across a Window > that > >> > also partitions the data by mmsi and name and is ordered by epoch. > This > >> > works, but is very slow (hours). The table is indexed (multi-column > >> > index on > >> > mmsi, name and index on epoch). Does anyone see a way to get what I am > >> > after > >> > in a more efficient manner. What I am after is an assignment of > transit > >> > number to vessels' position records based on whether the records were > >> > within > >> > two hours of each other. The SQL that I used is provided below. Any > >> > advice > >> > would be greatly appreciated... > >> > > >> > > >> > > >> > WITH > >> > > >> > cte_01 AS > >> > > >> > ( > >> > > >> > SELECT > >> > > >> > a.id, > >> > > >> > a.mmsi, > >> > > >> > a.name, > >> > > >> > a.epoch, > >> > > >> > a.the_geom > >> > > >> > CASE > >> > > >> > WHEN ((a.epoch - lag(a.epoch) OVER w) / 60) > 120 THEN 1 > >> > > >> > ELSE 0 > >> > > >> > END AS tr_index > >> > > >> > FROM table a > >> > > >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) > >> > > >> > ) > >> > > >> > > >> > > >> > > >> > > >> > SELECT > >> > > >> > a.id, > >> > > >> > a.mmsi, > >> > > >> > a.name, > >> > > >> > a.epoch, > >> > > >> > a.the_geom, > >> > > >> > 1 + sum(a.tr_index) OVER w AS transit, > >> > > >> > a.active > >> > > >> > FROM cte_01 a > >> > > >> > WINDOW w AS (PARTITION BY a.mmsi, a.name ORDER BY a.epoch) > >> > > >> > > >> > > >> > -- > >> > Jeff >