Hi Juergen,

I wrote a quick blog post about how I grouped up AIS points into lines, but 
broken up on steps in time, distance, and calculated speed.

It's very short but contains some thinking and some code.

https://openais.xyz/post/2022-10-28-ais-traj/

You could add an additional spatial filter in the first part of the blog query 
to only fetch AIS points that are within the geometry that defines your port 
area.

Rory
________________________________
From: postgis-users <[email protected]> on behalf of 
[email protected] <[email protected]>
Sent: 24 October 2022 19:40
To: [email protected] <[email protected]>
Subject: Re: [postgis-users] Issue with ST_Intersection seems related to lines 
crossing themself


Hello Rory,
Hello Simon,

thank you both for the feedback.

Sorry for the delay. I'm now back on this topic after my crazy last week.


First some additional explanation:
==================================
I use the LineString also to draw the track on a map (for each vessel / MMSI).
The track LineString from 00:00 of the current date to NOW() is generated 
"on-the-fly" from the stored GPS points (basically the RAW data you you get 
from the "AIS position report").
For today's data + 1-2 days older, the "on-the-fly" generated information is 
combined with a UNION from the information retrieved from table 
'ais_trajectory_mmsi_daily'.
Currently I've got ~2 million new GPS position entries each day.

In general my aggregation of points per day produces lines that go "over land" 
if there is no data for some time until the next Antenna picks up something 
again (or jumping points).
As you Rory pointed out I might have to go the Multi-LineString way so if I 
don't get new GPS points for about ~10+ Min (vessel out of antenna range; by 
AIS definition the max. interval is 180 seconds) I start a new LineString (plus 
some other checks like distance).
More on this further down below as this is an extra task.


To answer both of your questions:
=================================
Yes I can also use MultiPoint data.
I have now added a new column to store the MultiPoint (type: PointM) to the 
table 'ais_trajectory_mmsi_daily'.
To test it I updated the missing data from my stored individual point data "AIS 
position report" (which is only a short time storage due to the large amount of 
data coming in).
I also fixed the old LineString column definition which was missing the 
explicit SRID before.

--
-- Add extra column to also store the PointM (Point + Timestamp) data as 
MultiPointM geometry (WGS84/GPS).
--
ALTER TABLE ais_trajectory_mmsi_daily  ADD COLUMN geom_multipointm 
GEOMETRY(MULTIPOINTM, 4326);

--
-- Change column definition to geom type of LineStringM with WGS84 (SRID: 4326).
--
ALTER TABLE ais_trajectory_mmsi_daily ALTER COLUMN geo_lat_long_line TYPE 
GEOMETRY(LINESTRINGM, 4326) USING ST_SetSRID(geo_lat_long_line, 4326);


The complete table looks like this now:
=======================================
CREATE TABLE IF NOT EXISTS aisstream.ais_trajectory_mmsi_daily
(
    mmsi              integer,
    min_created_on    timestamp with time zone,
    max_created_on    timestamp with time zone,
    geom_linestring   geometry(LineStringM,4326),     -- Fixed: Added exact 
type + SRID 4326!
    geom_multipointm  geometry(MultiPointM,4326)
)


Index (so far no index on the geometry columns):
================================================
CREATE INDEX IF NOT EXISTS ais_trajectory_mmsi_daily_mmsi  ON 
aisstream.ais_trajectory_mmsi_daily USING btree (mmsi ASC NULLS LAST);


Additional indexes didn't really speed things up so far.
I've checked an GIST index on the 'geom_linestring' used in the ST_Intersects 
query as well as the 'min_created_on' column. I did an ANALYZE on the table 
afterwards to update the statistics. ;)

So far I still use the "... WHERE ST_Intersects(geom_linestring, geom_polygon) 
AND ..." with the LineString on my Polygon which is so far fast enough (0.5 
seconds). Once I run "ST_Dump(ST_Intersection(geom_linestring, geom_polygon)) 
in the SELECT part the query times completely explode (30+ minutes). With 
MultiPoints the ST_DUMP is much faster.
With the suggested MultiPoint approach (the newly added column) the query times 
are fine if I run "ST_Intersection(geom_multipointm, geom_polygon) on the 
already 'pre-filtered' data.
So, I now have my points inside my polygon still in ~0.5 seconds (if I check 
for a range of ~8 days).


Here is my current query:
=====================
--
-- Find lines intersecting the given polygon (port/berth area).
-- From those get the actual points (at least 2 point) that are inside the 
given polygon (port/berth area).
-- What we want for now is the timestamp of the first and last point inside the 
polygon.
-- ==> This information is lost by calling ST_Intersection(...)
--     in the subquery (see alias: geom_intersection)!
--
-- The query also dumps some additional details (now many points do we have,..).
--
--EXPLAIN ANALYZE
WITH my_port AS (
    SELECT ST_GeometryFromText('POLYGON((6.972692012786865 
50.929290943993465,6.971780061721802 50.930535164787784,6.971426010131836 
50.93042697299645,6.972337961196899 50.92918951148323,6.972692012786865 
50.929290943993465))', 4326) AS geom
)
,my_traj AS (
    SELECT ais.mmsi
          ,ais.min_created_on        AS mintime
          ,ais.max_created_on        AS maxtime
          ,ais.geom_linestring
          ,ais.geom_multipointm
    FROM ais_trajectory_mmsi_daily AS ais
)
SELECT geom_intersection.mmsi
      ,geom_intersection.created_on_date
      ,ST_NPoints(geom_intersection.port_mpoint_intersect) AS point_count
      ,ST_GeometryN(geom_intersection.port_mpoint_intersect, 1) AS first_pointm
      --
      -- The following does not exist any more
      --
      ,ST_M(ST_GeometryN(geom_intersection.port_mpoint_intersect, 1)) AS 
first_pointm_m_value
      --
      ,ST_GeometryType(ST_GeometryN(geom_intersection.port_mpoint_intersect, 
1)) AS first_pointm_geom_type
      ,geom_intersection.port_mpoint_intersect
      ,ST_GeometryType(geom_intersection.port_mpoint_intersect) AS geom_type
FROM (
    SELECT my_traj.mmsi
          ,CAST(my_traj.mintime AS date) AS created_on_date
          ,ST_Intersection(geom_multipointm, my_port.geom)    AS 
port_mpoint_intersect
          --,ST_3DIntersection(geom_multipointm, my_port.geom)  AS 
port_mpoint_3dintersect  -- function does not exist error
          ,my_traj.geom_linestring
          --,(ST_Dump(ST_Intersection(my_traj.geom_linestring, my_port.geom) 
)).geom     AS geom_dump    -- extremely slow (linestring)
          --,(ST_Dump(ST_Intersection(my_traj.geom_multipointm, my_port.geom) 
)).geom    AS geom_dump    -- fast (multipoint)
    FROM my_port, my_traj
    WHERE ST_Intersects(my_traj.geom_linestring, my_port.geom)
      AND my_traj.mintime >= '2022-10-16'
      AND my_traj.maxtime <  '2022-10-24'
    GROUP BY mmsi, created_on_date , my_port.geom, my_traj.geom_multipointm 
,my_traj.geom_linestring
) AS geom_intersection
WHERE ST_NPoints(geom_intersection.port_mpoint_intersect) > 1
GROUP BY mmsi, created_on_date, port_mpoint_intersect
;


1 - Issue - How do I get my timestamp for the points inside my polygon?:
========================================================================
I now have the issue that I lost the timestamp stored within my Point ("M" 
value) because ST_Intersection(..) removes this 
(https://postgis.net/docs/ST_Intersection.html).

The documentation kind of suggests 'ST_3DIntersection' (at least the Z is not 
removed...so I assume "M" would be available too after calling this function). 
This function should be available with PostGIS 2.1.0+.
I'm running PostgreSQL 13.7 with PostGIS 3.2.1 here (Ubuntu Linux).
The problem is that I get the error "function st_3dintersection(geometry, 
geometry) does not exist". I don't get why this is the case.

So how can I get back the timestamp for my identified points?
Some kind of LATERAL JOIN on my original "geom_multipointm" column data?
Sorry, I'm lost here.


2 - How to creating a Multi-LineString instead of a single LineString?
I assume some kind of window function LAG/LEAD between the points.
=================================================================
The other (new) task is break down my single LineString to Multi-LineString 
while aggregating the geom point data.
Currently the data is aggreated per day and MMSI. The 'Point' is converted to a 
'PointM' (with the created_on timestamp) to be added to ST_MakeLine(geom_pointm 
ORDER BY created_on).
It seems I need some help on how to create this "multi-linestring".
I assume I need to use a window function to compare the current with the next 
and if >= 10 Min this following point is in the next linestring (minimum of two 
points required in each group) + check distance (to remove total nonsense).
For starters the >= 10 Min gap to generate Multi-LineStrings would be ok. The 
rest is for additional fine tuning later on. I started experimenting with the 
duration between two points using a window function.


Many thanks in advance!


Juergen

Gesendet: Montag, 17. Oktober 2022 um 15:09 Uhr
Von: "Rory Meyer" <[email protected]>
An: "PostGIS Users Discussion" <[email protected]>
Betreff: Re: [postgis-users] Issue with ST_Intersection seems related to lines 
crossing themself
Hi Juergen,

I've been helping on an open source project for processing AIS data in postgis 
with a focus on scientific applications instead of commercial ones: 
http://openais.xyz/ It's still a little immature and the database portion of 
the project is still closed while I rewrite some of the containers.

Somethings that have helped me:

  *   Since AIS is a timeseries, geospatial dataset it often helps to have a 
timeseries plugin to work with. The TimeScaleDB HA version contains both 
PostGIS 3+ and TimescaleDB plugins. Timescaledb lets you automatically 
partition a table based on time and to automatically create binned data from 
live data. You could have a first/last position per ship MMSI per hour/day/week.
  *   The GPS on AIS is generally pretty accurate but can suffer from poor GPS 
lock when in ports that have high buildings, cranes etc near the harbour. This 
causes the vessels to "jump" around much more than expected. There are also 
instances of the vessels jumping several hundred km's in either the latitude or 
longitude direction. This might be due to bit-flip errors in the AIS VHF 
message.
  *   I would say that having a line intersect with your port geometry, even if 
no points are within it, is the kind of behaviour that you want. If this is not 
the case, then I would just check for points within the port area instead of 
the trajectory.
  *   For handling a trajectory that is on the edge of the port area and 
causing multiple, close in time, entrance and exit signals it might be worth 
looking into some of the following:
     *   Some kind hysteresis function by having a combination of the port 
boundaries and a buffer of the port boundaries.
     *   Filtering the entrance/exit signals through time. TimeScaleDB gives 
aggregate functions like "first(<column>, timestamp)" that would select the 
first item in the column by timestamp. First grouping the data by some sensible 
time window (daily) and then selecting the first/last entrance/exit signal per 
vessel
     *   Ignoring entrance/exit signals where the reverse signal occurred 
within X seconds.
     *   You might need more complex Trajectory generation code, something that 
uses window functions to break trajectories on obviously bad points (gaps 
between messages longer than 1 hour, sudden jumps of 100km+, points where 
calculated speed > 2* reported speed)
     *   It might be worth dumping the Z-value of the trajectory to get the 
first/last time in the intersection segment.
  *   To make your SQL a little easier to read you can use Common Table 
Expressions:

```
WITH my_port AS
(SELECT ST_GeometryFromText('POLYGON((7.1084729427821 
50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 
50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 
50.7352828020046))', 4326) AS geom ),

my_traj AS
(SELECT
      ais.mmsi,
      ais.created_on_date as date,
      ais.min_created_on as mintime,
      ais.max_created_on as maxtime,
      ais.geo_lat_long_line as geom --renaming these to just help me out here
 FROM ais_trajectory_mmsi_daily AS ais
)
SELECT
      my_traj.mmsi,
      my_traj.date,
      ST_Intersection(my_traj.geom, my_port.geom) as port_traj_intersect,
      ST_ZMin(ST_Intersection(my_traj.geom, my_port.geom)) as 
segment_start_time,
      ST_ZMax(ST_Intersection(my_traj.geom, my_port.geom)) as segment_end_time
FROM my_geom, my_traj
WHERE ST_Intersects(my_traj.geom, my_port.geom)
AND my_traj.mintime > '2022-10-01'
AND my_traj.maxtime < '2022-10-02'

```
Hope that helps some,
Rory



________________________________
From: postgis-users <[email protected]> on behalf of Simon 
SPDBA Greener <[email protected]>
Sent: 17 October 2022 01:56
To: [email protected] <[email protected]>
Subject: Re: [postgis-users] Issue with ST_Intersection seems related to lines 
crossing themself


Would using a MultiPoint (or GeometryCollection with Points) for the GPS 
locations rather than a Linestring be possible?

Simon

On 17/10/2022 10:10 am, [email protected]<mailto:[email protected]> wrote:
Hi,

I've got an issue with an SQL PostGIS query and hope to get a solution here.

First some details.
I'm storing GPS locations of vessels by their MMSI identification number (AIS) 
that are received in 3 Sek to 3 Minutes ranges in a table.
As this becomes quite unhandy in a very short time, each day the basic details 
mmsi, received_at (Timestamp), lat, long are converted into a Points with the 
Timestamp integrated and this is then converted into a LineString for each 
MMSI, per day, with the first and last timestamp of the day for which a point 
exists as direct columns.

Now I want to know when some vessel is/was inside a specific area (harbor, 
berth) so I have a Polygon which the LineString has to intersect.
So far so good.
This works but can create false positives (two points outside the polygon are 
connected and this created line goes through the Polygon but there is no actual 
Point of that line in the Polygon)...so I need to re-check again by pulling out 
the points from my LineString.


Here the basic search query that basically works:
================================
--
-- Return all vessels (MMSI) found in the selected timeframe where the
-- LineString Intersects the Polygon given as parameter to search.
--
SELECT ais_trajectory_mmsi_daily.mmsi
      ,ais_trajectory_mmsi_daily.min_created_on
      ,ais_trajectory_mmsi_daily.max_created_on
      ,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326) AS 
geo_lat_long_line
FROM ais_trajectory_mmsi_daily
WHERE ST_Intersects(ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 
4326),
                    ST_GeometryFromText('POLYGON((7.1084729427821 
50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 
50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 
50.7352828020046))', 4326) )
  AND min_created_on >= CAST('2022-10-01' AS TIMESTAMPTZ)
  AND max_created_on <  CAST('2022-10-15' AS TIMESTAMPTZ)
;


Here is now the query I use with ST_Intersection and where it starts to go 
wrong.
When a vessel stays a the same place the GPS coordinates are the same + drift.

As those points are all added to the LineString the line sections are running 
over themself and it looks like a "woolen noodle".
This seems to cause issues with ST_Intersection as I use it because instead of 
getting only the entering and exiting details as normally 1 row where the 
LineString hits my Polygon with times I now get hundreds of rows as a result 
and the visual result shown on the map is weird and sometimes wrong (the line 
goes out of the polygon in a way the original LineString did never exist).

Here is the query in question (basically the part above) with ST_Dump and 
ST_Intersection in the SELECT part + plus an outer query to get more details on 
what was the first and last time there was an intersection.

--
-- Outer query with union to visualize the Polygon and the
-- data received from the LineString that Intersects
-- with the polygon via e.g. pgAdmin.
--
SELECT NULL AS mmsi
      ,NULL AS created_on_date
      ,NULL AS min_created_on
      ,NULL AS max_created_on
      ,ST_GeometryFromText('POLYGON((7.1084729427821 
50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 
50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 
50.7352828020046))', 4326) AS geom
      ,NULL AS geom_start
      ,NULL AS geom_end
UNION
-- The actual data of interest
SELECT intersection.mmsi
      ,intersection.created_on_date
      ,intersection.min_created_on
      ,intersection.max_created_on
      ,intersection.geom
      ,MIN(DATE_TRUNC('second', 
TO_TIMESTAMP(ST_InterpolatePoint(intersection.geo_lat_long_line, 
ST_StartPoint(intersection.geom)) )::timestamptz )) AS geom_start
      ,MAX(DATE_TRUNC('second', 
TO_TIMESTAMP(ST_InterpolatePoint(intersection.geo_lat_long_line, 
ST_EndPoint(intersection.geom))   )::timestamptz )) AS geom_end
FROM (
   SELECT ais_trajectory_mmsi_daily.mmsi
         ,CAST(ais_trajectory_mmsi_daily.min_created_on AS date) AS 
created_on_date
         ,ais_trajectory_mmsi_daily.min_created_on
         ,ais_trajectory_mmsi_daily.max_created_on
         ,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326) AS 
geo_lat_long_line
        --
        -- Here seems to be the problem...
        --
         
,(ST_Dump(ST_Intersection(ST_GeometryFromText('POLYGON((7.1084729427821 
50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 
50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 
50.7352828020046))', 4326)
                                  
,ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 4326)
                        )
                  )).geom AS geom
   FROM ais_trajectory_mmsi_daily
   WHERE ST_Intersects(ST_SetSRID(ais_trajectory_mmsi_daily.geo_lat_long_line, 
4326),
                       ST_GeometryFromText('POLYGON((7.1084729427821 
50.7352828020046,7.1089771980769 50.7353778664669,7.10848259899649 
50.7365542732199,7.10801482200623 50.7364673592064,7.1084729427821 
50.7352828020046))', 4326) )
        AND min_created_on >= CAST('2022-10-01' AS TIMESTAMPTZ)
        AND max_created_on <  CAST('2022-10-02' AS TIMESTAMPTZ)
) AS intersection
GROUP BY intersection.mmsi
        ,intersection.created_on_date
        ,intersection.min_created_on
        ,intersection.max_created_on
        ,intersection.geom
ORDER BY created_on_date, mmsi, geom_start
;

If needed I can try to create some sample with data to put on sqlfiddle.

On Stackoverflow I found this which seems to be related to the same issue:
https://stackoverflow.com/questions/62090829/why-st-intersection-from-postgis-returns-multilinestring-for-self-intersecting-l

Many thanks in advance for hints to solve the problem.

Juergen





_______________________________________________
postgis-users mailing list
[email protected]<mailto:[email protected]>
https://lists.osgeo.org/mailman/listinfo/postgis-users


--
Simon Greener
39 Cliff View Drive, Allens Rivulet, 7150, Tasmania, Australia
(m) +61 418 396 391
(w) www.spdba.com.au<http://www.spdba.com.au>
(m) [email protected]<mailto:[email protected]>

_______________________________________________ postgis-users mailing list 
[email protected] 
https://lists.osgeo.org/mailman/listinfo/postgis-users
_______________________________________________
postgis-users mailing list
[email protected]
https://lists.osgeo.org/mailman/listinfo/postgis-users

Reply via email to