On 01/08/2013 12:17 PM, derrick nehrenberg wrote:
It seems like the majority consensus is that the US Land Ownership (or management data) doesn't belong in the OSM database. So, I guess I won't be adding it.
[... argument for why the USFS Managed Land cadastre belongs ...]
If anyone has any further arguments for why US Land Ownership (or management) data isn't or is a good candidate for OSM crowdsourcing, I would definitely be very interested in hearing that.
Importing data such as this is the subject of recurring impassioned arguments. I think there are some fundamental misunderstandings that cause the arguments to shed more heat than light. Since, like Derrick, I'm interested in hiking maps, I think I ought to share my experiences with producing them. Essentially, I'm trying to sort out for New York State the data sources to do something similar to what TopOSM (www.toposm.org) does for Massachusetts. You can see a work in progress - the southeast corner of New York - at http://kbk.is-a-geek.net/catskills/test2.html . OSM is not, nor ought it to be "one-stop shopping" for all the geodata that go into a map. The lack of a particular type of object in OSM in a given region is usually the rationale for advocating for an import: "My maps will all need this, and other people's maps will, too - so it should be in OSM." But it's fundamentally flawed. A typical map will be built from many data sources, not limited to OSM. In fact, the map linked to above has over two dozen layers, and ten public datasets (some of them represented by multiple shapefiles) went into its construction. Nobody argues that topography belongs in OSM, for instance. In fact, some of the data sources in the map above partially duplicate OSM. There are polygons in OSM for New York's state forest preserve - but the map production actually filters those out. Why? Because (a) there is a newer version of that file at NYSGIS, and (b) the import had errors. Why not reimport? Mostly because the original letter granting permission has been lost, and the license terms offered to the public on the NYSGIS web site are inconsistent with ODBL. Moreover, there's no point in reimporting. Nobody edits the forest boundaries in OSM. There really isn't anything sensible that a mapper can do with them. You can't see the boundaries on aerial photographs, and for the most part getting to them in the field would involve off-trail hiking over densely vegetated and extremely steep terrain. The preserve's boundary is frequently unmarked or indicated only by faded and indistinct paint blazes, and the only certain way to establish metes and bounds is to locate the steel stakes that surveyors drove at corner points, usually with a metal detector. Because of these difficulties, the state's geodata can be generally regarded as the most authoritative available source. Similarly, I ignore all hydrography in OSM, because OSM's water features in my part of the world are incomplete. Instead, I use NHD as a data source. Where I am, it's quite complete and accurate. (I understand that is not the case in some other places.) Hydrography is another thing that's hard for local mappers to do; in many places, rivers, streams and ponds are on private property and cannot be surveyed on the ground. Shorelines are often indistinct in aerial photos. Of course, there are other places where local mappers could improve things significantly. In places where they can approach lake shores and riverbanks on foot or by boat, they can generate authoritative data. For this reason, I think of hydrography as a hybrid case, one where the imported data would sometimes be sacrosanct and sometimes benefit from crowdsourcing. I'll return to this point shortly. I bring in topography (both contour line and hillshading) from NED, wetland polygons from USFWS National Wetlands Inventory, forest amenities (car parks, trail shelters, viewpoints, information kiosks, boat launches, etc.) from files that a colleague obtained under the Freedom of Information Law. The wetlands and elevation data, like the public land polygons, are not something that local mappers are going to change. Since they are omitted from many maps (used most often for maps used in outdoor recreation), it's probably not worth importing them - let mapmakers who want them add the appropriate layers. The forest amenities could be imported - and in fact their placement in the state-level geodata is poor enough that local mappers could significantly improve on it. That's the third case - all the features in that layer are easily spotted in the field, readily accessible, and local mappers could place them all on the map. The really complicated situation comes about with roads and trails - and you'll notice that the map linked above has a problem with them. It has a number of sources of data: a NYS Department of Environmental Conservation (DEC) data set of roads and trails on DEC-managed lands; a series of data sets (again obtained by a colleague under the Freedom of Information Law) of GPS tracks from walking the trails in state parks; a series of personal GPS tracks; and OSM itself. None of these data sources has everything, and a great many objects are duplicated among two or more of these. Moreover, all of these data sources contain errors - for instance, the road running east from the Devil's Acre shelter south of Hunter Mountain is a rugged hiking trail and exists as a road only in the fevered imagination of the Census Bureau workers who produced the TIGER files. There are multiple alignments for several trails, and incorrect alignments for more. This is the situation that has given imports a bad name. There are multiple sources of data, none of them perfect, and challenges in conflating the multiple representations. I can see that the the false road in TIGER aligns closely with a real trail in the Roads and Trails file and with a track on my GPS, but with the partial and inexact alignment, an automated conflation tool would be hard put to identify them. Even if I cleaned up the braided trails that appear on that map, by deciding that one or another input is authoritative, I still would have no way to record the fact in such a way that when the government issues another version of one of the datasets, I'd be able to update without going through the entire exercise again. The OSM data model is weak in its ability to record this type of decision: "A multiline in TIGER, unique ID 12345678, named 'XYZ Road', indicated in TIGER to be at this location, has been deleted intentionally by a mapper," or "A multiline in TIGER, unique ID 12345678, has been conflated with another multiline in NYS DEC Roads & Trails, unique ID 987654, and with a linear feature, OSM ID 23456789." Without this information, an attempt to update an import gives rise to either discarding the hard work of mappers who fixed the previous import, or introducing spurious conflicts and reintroducing bad data that were intentionally deleted. Boundary information has similar issues; consider the polygons representing wards of a city, the boundary of the city itself, the boundary of the county in which the city lies, the boundary of the state and the boundary of the nation. All of these may come from different sources and align imperfectly; yet the intent is that all of them may share segments in the case of a border town, where ward, city, county and state all end at the national border. And the question of "which data set is right" cannot be resolved by a local mapper - the borders are invisible lines in the field. I don't have a good answer for the issue of inconsistent data sources, but it's an unavoidable feature of the real world. I suspect we can learn a lot from how the open-source software community handles distributed version-control systems; change conflicts on open-source software projects are routine, and there are good tools for merging inconsistent changes into a consistent whole. So, what's the takeaway? Imports where OSM can take custody of the data and local mappers can clean it up - This is the perfect case. The forest amenities might be such a case: the imported file was obtained with difficulty, so the import is likely not to be repeated, the points of interest can be verified by local mappers, and OSM can own the result. It duplicates very little data that is in OSM already - and the duplication is easily detected by finding nodes within a given radius of an imported point. Imports where OSM mappers are unlikely ever to edit the data - Should be done only if there is obvious value added by having the data in OSM rather than in a separate layer. Mapmakers, generally speaking, include many layers of data; including a few more because an import was not done is no big deal. But at least these imports are mostly harmless - it's easy to identify, update and delete the imported objects. NHD (I've decided; I know that I was once a proponent of importing it in bulk), NED, USFWS Wetlands Inventory, NYS DEC Lands, all fall in this category. Imports where both OSM and the originator are likely to update the data - These are the problematic ones. Once we import data, we own the responsibility to keep it up to date, and as I observed above, we really don't have the tools to manage repeated merges from heterogeneous data sources. Arguably, if at all possible, we should let local mappers redo these - certainly that appears to be the position of some of the Germans. But it's a daunting task, and I know that I can be motivated more easily to fix some mistakes in the map than I can to fill in a huge area of whitespace. (Apparently others are different, if the simulations are valid.) In this category, I'd still be tempted to import NYS DEC Roads and Trails - assuming that the licensing issues can be negotiated - because it would be filling in significant whitespace, and would introduce only a handful of conflicts. Most of those are with the earlier TIGER import, and in those, in the vast majority of cases, the TIGER data are simply wrong - at the level of indicating roads that never existed nor could have existed (going up sheer cliffs or crossing chasms. And mappers could improve the data significantly. But I don't want to do it yet, because at the present state of development, it'll just make a bigger mess down the road. Just as a possible reimport of TIGER is problematic enough to have had a significant thread of discussion going for weeks, so this import would generate similar problems in the future. Until the TIGER importers have a better answer, I don't want to compound the woes. So: for me, imports seem to be falling into two categories: "don't do it" (where I can simply add layers to my maps), and "don't do it yet." (where I'm likely to make trouble for future mappers). Unfortunately, the sweet spot of "full steam ahead" appears not really to exist. This situation disappoints me, because I really want to get rid of those braided trails. But I haven't had the time to explore better approaches to conflation and change management. Sorry that this message has been so long. To paraphrase Pascal, I had not the time to compose a short one. -- 73 de ke9tv/2, Kevin _______________________________________________ Talk-us mailing list [email protected] http://lists.openstreetmap.org/listinfo/talk-us

