Mark Cave-Ayland wrote:
Dave Fuhry wrote:
Mark,

   I'm beginning to wonder if the stricter-EWKB-parsing patch applied
in November was a mistake.

   I have an app which bulk-loads shapefiles (of varying quality),
then "repairs" or NULLs geometries which are not isvalid().  I'm not
finding a good way to bulk-load input data when the dataset has a
record which causes:

ERROR:  geometry contains non-closed rings

COPY (shp2pgsql -D) is out, since COPY aborts on error.  From
discussions on pgsql-dev, it is not clear whether COPY will support a
"SKIP ERRORS" or "ERRORS TO error_table" clause anytime soon.  Even in
that case, I would like a convenient way to keep the table's other
(non-geometry) attributes.

For shp2pgsql's insert-statement mode, records are grouped into
250-record batches surrounded by BEGIN; ... END;, so an erroneous
record will abort the 250 records in its batch.  Removing transactions
entirely is no good for bulk-loading, since the database will be
forced to commit every record to disk before processing the next.

Another option would be to move EWKB parsing logic to shp2pgsql so
that shp2pgsql can decide how to handle erroneous geometries.  This
option seems ugly and redundant to me, although I'll defer judgement.

Lastly, maybe some per-session option to allow postgis to import
erroneous geometries is in order.  Then they can be corrected in a
controlled fashion by isvalid() queries.  I'm somewhat preferential
towards the geometry processing functions (in the below example,
st_simplify()) being robust in the face of questionable geometry
anyway.  Thoughts?

Thanks,

Dave


Hi Dave,

That's an interesting one. The problem wasn't so much to do with allowing valid/invalid geometries rather than to make the behaviour consistent between WKT and WKB inputs.

This brings back the whole issue as to how strict we should be when accepting data. We could argue that the database should be quite strict as to which geometries are accepted, but then again we have the (rather expensive) IsValid() function which indicates whether a geometry meets the extra criteria for the GEOS functions.

I think at the end of the day it comes down to: what does the OGC spec say and what do other databases do? I'd prefer to stick to the letter of the spec wherever possible. We could potentially look at altering shp2pgsql to use the geometry parser so that erroneous geometries are written out to a separate shapefile if that helps. However at the moment it's quite far down on the TODO list unless anyone wants to sponsor a developer to work on it.


ATB,

Mark.


Similarly, some applications like UNM Mapserver CAN use geometries that are NOT IsValid(). While I mostly use shapefiles to load data, it would be bad to loose the ability to load geometries that are are not good. I think having IsValid() is sufficient to sort out the good from the bad.

My 2 cents,
  -Steve
_______________________________________________
postgis-users mailing list
[email protected]
http://postgis.refractions.net/mailman/listinfo/postgis-users

Reply via email to