On Wed, Nov 14, 2012 at 9:24 AM, Shaun McDonald <[email protected]> wrote: > > On 14 Nov 2012, at 11:48, Paweł Paprota <[email protected]> wrote: > >> On 11/14/2012 12:33 PM, Brett Henderson wrote: >> > >>> It sounds good to me. Osmosis typically tries to maintain data accuracy >>> with no surprises, so I'm not particularly happy with the current >>> situation of dropping ways even if they are invalid. >>> >>> Jochen Topf was the one who originally introduced the checks in >>> WayGeometryBuilder to ensure a Way contained at least two nodes. He >>> might have some thoughts on whether we can remove the checks. Perhaps >>> it was simply introduced to avoid the additional overheads of having to >>> do st_isvalid() checks? >>> >> >> Based on my experience with processing geometry for OSM objects I'd strongly >> discourage having any invalid geometries in the database. This leads to very >> unpleasant surprises with ST_Union, ST_Intersection and other spatial >> functions. Upgrading the GEOS library (which PostGIS uses) helps a bit but >> still many operations can behave very strangely and after hours/days of >> debugging you find yourself hitting the "invalid geometry" wall. >> >> Whether Osmosis should be responsible for maintaining valid geometries is >> kind of a different question I think - depends on policy. But whatever you >> decide, it needs to be communicated front and center in documentation what >> geometry is created. > > The problem is that if you are using Osmosis to shunt the data into a > database that you use to find and highlight these invalid geometries for the > community to go and fix in the source data. > > I think that Osmosis could have a filter to drop invalid data, or even the > inverse of only outputting the invalid data.
Yes, that is how I discovered this "feature" in the first place. I was generating a list of single node ways from my pgsnapshot database for someone who wanted to fix them. There didn't seem to be as many as I thought there should be. When I went looking I noticed that the only ones in my database are from after I started minutely replication. Which brings me back to "invalid geometries already exist in the database." Although Pawel's point about this causing weirdness with some of the postgis functions is something to consider. While some of them do already exist in the database, taking out these checks would increase the number of them on a fresh planet import by quite a bit. Like, an order of magnitude or two. Is it possible to check the validity of a linestring in java? I see the LineString class has a checkConsistency method however it is returning false for all linestrings even if they are valid. I'm not seeing another obvious method. If this were possible, I would suggest adding an option to the write-pgsql(-dump) tasks to control this behavior. Something like includeInvalidLinestrings=yes/no which would allow the user to choose. This would also remove the multi-node-at-same-location ways from the database. Even if checking for validity is not possible, the option could still be added. Maybe "avoidInvalidLinestrings"? If "no" then shove everything in. Otherwise keep the current behavior and drop single node ways to minimize invalid linestrings. This would also partially address Pawel's concern about calling this issue to the user's attention since the option along with a description would be listed in the detailed usage. And what about zero node ways? As I mentioned before, technically these appear to be valid and are assigned a static "empty geometry" value (not null). Right now they are being excluded along with single node ways. Should they be included regardless of what happens with single node ways? Toby _______________________________________________ osmosis-dev mailing list [email protected] http://lists.openstreetmap.org/listinfo/osmosis-dev
