Re: [OSM-dev] Why are so many changeset so large?
On 10/17/2012 07:43 AM, Paweł Paprota wrote: I agree. I will add changeset comments to changeset descriptions on the demo instance and let's see how this turns out. I said that but then I remembered that changeset metadata is not available in the replication feed - only through public API or the weekly dump of all changesets. This is a complication. I need to think how to structure the deployment of this whole thing. Right now there are some dependencies (PostGIS database, replication feed) that may not be needed in the future. I will try to start a discussion about it this week. Paweł ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On Wed, 2012-10-17 at 00:28 +0100, Tom Hughes wrote: On 17/10/12 00:04, Alex Barth wrote: - Are there technical reasons why changesets should tend to be large? Are they expensive on some level? I believe it's entirely because we've got so many people doing mechanical or semi-mechanical edits. That includes bots but also things like people using xapi or overpass to download all objects matching some set of tags, then change those tags and reupload. the historical answer to this is that when changesets were added to the OSM API there were two different intentions for their use which got conflated: first, that changesets were structures for grouping edits sharing common attributes. and second, that changesets were VCS-style 'commits' which would be uploaded in a single request and applied atomically. effectively, the first use-case was for users, and tried to make changesets as open-ended as possible. from this, we get tags on changesets for comments, editor, bot-ness, etc... and the ability to keep uploading into an open changeset. the second use-case was a technical thing - the sheer number of API calls to individual elements, even from normal-sized editing sessions, could cause problems. and, for small calls, HTTP headers and round-trip latencies would dominate the cost of an upload. further, editors had to cope with the situation where an upload failed half-way through and to re-try the failed calls. from this, we get a single changeset/#id/upload call which applies atomically. at the time, this seemed like a good way to satisfy both use-cases. and, while it does what it set out to, i think we should consider splitting these in the next API version; explicitly reifying uploads at which bboxes / coverage sets and change counts can be stored. changesets can then simply be collections of uploads. getting to the point: this might to some extent mitigate the large changesets issue, as it would allow bboxes to be collected at a smaller granularity. however, it wouldn't be a full solution and we'd probably still need something like OWL to break down the geographic footprint of changesets further. cheers, matt ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 17 October 2012 13:53, Matt Amos zerebub...@gmail.com wrote: getting to the point: this might to some extent mitigate the large changesets issue, as it would allow bboxes to be collected at a smaller granularity. however, it wouldn't be a full solution and we'd probably still need something like OWL to break down the geographic footprint of changesets further. Further to this, I find this changeset extent problem is often caused by looking at things the wrong way around. If you want to find out what area the changeset covers, then we supply a bounding box to help. However, if you want to know which changesets affect a given area, this reverse question is much less easily answered. Hence OWL, etc. Beyond that, the extent is more of a promise that there are no edits on the outside, rather than any guide to what's within. No changeset completely fills, nor even evenly fills, its extent. There is a widespread and very shakey assumption that smaller changesets are somehow more likely to be rectangular or have a more even distribution across themselves, but this won't hold in the real world in pretty much any circumstances[1]. Basically, I see no need to worry about the extent of bounding boxes, and no need to move to having bboxes on uploads instead of changesets or other complications. No matter what we do, if your interest in a changeset extends beyond the details of its extent, you need a mechanism (again, e.g. OWL) to detail the actual locations of the edits to the entities, and different interests (and different entities) will have even have different buffers of interest around them. Lets focus on things like that. Cheers, Andy [1] Unless we all live in cities with north/south street grids and map each city block in individual changesets :-) ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 10/17/2012 03:30 PM, Andy Allan wrote: Basically, I see no need to worry about the extent of bounding boxes, and no need to move to having bboxes on uploads instead of changesets or other complications. No matter what we do, if your interest in a changeset extends beyond the details of its extent, you need a mechanism (again, e.g. OWL) to detail the actual locations of the edits to the entities, and different interests (and different entities) will have even have different buffers of interest around them. Lets focus on things like that. Exactly. What I do right now with the Activity Server is I store the whole geometry of a changeset. When a bounding box query comes, I use ST_Intersects between the bbox and geometries. This has the desired effect you write about: that is, with a changeset that contains changes in Sydney and in Canada, you will only get it in the query result for those two places, not for anywhere in the world like it is right now in the History tab. I am bit concerned about scalability of this, Matt clearly stated in one of the earlier discussions that dumping every changeset to one table won't scale. I'm now looking to dig into OWL's code and see how my work relates to it - I think it potentially could make sense to somehow bring the two projects together or at least integrate them at some level (OWL publishing activities to the Activity Server?). Paweł ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 17/10/12 17:20, Alex Barth wrote: Matt Amos wrote: from this, we get a single changeset/#id/upload call which applies atomically. Is that so? I thought changesets were not applied atomically leading to issues where it is hard to find out what data got applied when a connection breaks down or an editor crashes. A changeset isn't atomic, but an upload should be as it is done in a transaction. The changeset isn't atomic because it may have multiple uploads grouped in the same changeset. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 10/17/2012 06:20 PM, Alex Barth wrote: It seems that OWL and Activity Streams have the exact same problem here... I have been talking with Matt today on IRC and to me it looks like we have been asking ourselves the same questions and overall I think that replacing a big chunk of the Changeset Activity Publisher [1] that I've developed with OWL is the right thing to do. At this point I want to spend a few days familiarizing myself with OWL code base to see what's the current status and how does it fit into the whole Activity Streams picture. [1] https://github.com/ppawel/osm-activity-publishers/blob/master/changeset-publisher/ Paweł ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
From: Alex Barth [mailto:a...@mapbox.com] Subject: Re: [OSM-dev] Why are so many changeset so large? BTW, I did some cursory digging in the changesets dump and found that actually only a relatively small percentage of changesets are geographically large. Trying to use the history tab they seem to be more numerous. I don't have numbers yet, but I hope I can share some soon. The issue is that you see every large changeset. This is most obvious in areas with no editing like the middle of the ocean. Looking at the size of an average changeset weighted by changeset size might produce data that comes closer to what you see in the history tab. The problem is then that you don't care about the history tab for most of the world, only where people or mappable features are. Incidentally, it's possible to make a changeset that only touches a small area but has a larger bbox with the expand_bbox call - see http://wiki.openstreetmap.org/wiki/API_v0.6#Expand_Bounding_Box:_POST_.2Fapi.2F0.6.2Fchangeset.2F.23id.2Fexpand_bbox I've never used it myself. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 17.10.2012 09:15, Jochen Topf wrote: I think one reason people add bad changeset comments and organize their changesets in a bad way is that for most people those changesets and the comments just disappear into a black hole. One thing that is also bad in my point of view ist that you can't edit the comment on the changeset any more. So if you are e.g. too fast in JOSM with key pressing/clicking OK you have a wrong comment there which you never can correct (at least I'm not aware how to do it). Best regards, Michael. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Why are so many changeset so large?
I really like how activity streams shows easy-to-understand changes on the map using changemonger [1,2]. At the same time it creates an alternative break down of changes that is more granular than changesets. This diverts attention from _comments on changesets_. This is not ideal in my mind - these comments on changesets have great potential to become an even more important communication channel in the future. I understand activity streams / changemonger suggests a broken up view of data changes because many changesets are so large that they are effectively not meaningful. I'd like to understand better why these changesets are so large. Unscientifically digging back on the history of today, I'm seeing many many changesets that seem like they could be just as well much smaller - both in the sense of geographic extent and number of elements - I don't want to call anybody out here, but this is what I found: - http://www.openstreetmap.org/browse/changeset/13514072 - http://www.openstreetmap.org/browse/changeset/13523015 - http://www.openstreetmap.org/browse/changeset/13508818 I understand that there will always be cases where a large changeset makes sense (e. g. bot changes), but it seems that we have many unnecessarily large changesets that make changesets a not very useful granularity for looking at data history. My questions - What are the recommendations for change set sizes? - Are there technical reasons why changesets should tend to be large? Are they expensive on some level? - Could editors encourage users to do more and smaller changesets? - What else could be done to encourage smaller changesets with meaningful comments? [1] http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html [2] Click on 'activity' here http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M Alex Barth http://twitter.com/lxbarth tel (+1) 202 250 3633 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 17/10/12 00:04, Alex Barth wrote: - What are the recommendations for change set sizes? Personally I tend to put everything that is logically grouped together in one changeset where possible. But by that I mean that I'll spend a few hours out collecting data in a small area and then probably upload that in one changeset - sometimes more than one if I take a break while editing and it times out. - Are there technical reasons why changesets should tend to be large? Are they expensive on some level? I believe it's entirely because we've got so many people doing mechanical or semi-mechanical edits. That includes bots but also things like people using xapi or overpass to download all objects matching some set of tags, then change those tags and reupload. - Could editors encourage users to do more and smaller changesets? - What else could be done to encourage smaller changesets with meaningful comments? Encouraging people to go out and do actual local survey based mapping instead of trying to enforce their tagging ideas on the whole world with mass edits. Tom -- Tom Hughes (t...@compton.nu) http://compton.nu/ ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
Hi Alex, What do you mean by large? Do you mean changesets that span a large area (spanning whole continents)? Or changesets that have a lot of objects modified (perhaps more than 1000)? Based on the examples you provided, it seems you mean the former. Is this correct? Eugene On Wed, Oct 17, 2012 at 7:04 AM, Alex Barth a...@mapbox.com wrote: I really like how activity streams shows easy-to-understand changes on the map using changemonger [1,2]. At the same time it creates an alternative break down of changes that is more granular than changesets. This diverts attention from _comments on changesets_. This is not ideal in my mind - these comments on changesets have great potential to become an even more important communication channel in the future. I understand activity streams / changemonger suggests a broken up view of data changes because many changesets are so large that they are effectively not meaningful. I'd like to understand better why these changesets are so large. Unscientifically digging back on the history of today, I'm seeing many many changesets that seem like they could be just as well much smaller - both in the sense of geographic extent and number of elements - I don't want to call anybody out here, but this is what I found: - http://www.openstreetmap.org/browse/changeset/13514072 - http://www.openstreetmap.org/browse/changeset/13523015 - http://www.openstreetmap.org/browse/changeset/13508818 I understand that there will always be cases where a large changeset makes sense (e. g. bot changes), but it seems that we have many unnecessarily large changesets that make changesets a not very useful granularity for looking at data history. My questions - What are the recommendations for change set sizes? - Are there technical reasons why changesets should tend to be large? Are they expensive on some level? - Could editors encourage users to do more and smaller changesets? - What else could be done to encourage smaller changesets with meaningful comments? [1] http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html [2] Click on 'activity' here http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M Alex Barth http://twitter.com/lxbarth tel (+1) 202 250 3633 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
Eugene - right, I mean changesets that are geographically large. On Oct 16, 2012, at 8:03 PM, Eugene Alvin Villar sea...@gmail.com wrote: Hi Alex, What do you mean by large? Do you mean changesets that span a large area (spanning whole continents)? Or changesets that have a lot of objects modified (perhaps more than 1000)? Based on the examples you provided, it seems you mean the former. Is this correct? Eugene On Wed, Oct 17, 2012 at 7:04 AM, Alex Barth a...@mapbox.com wrote: I really like how activity streams shows easy-to-understand changes on the map using changemonger [1,2]. At the same time it creates an alternative break down of changes that is more granular than changesets. This diverts attention from _comments on changesets_. This is not ideal in my mind - these comments on changesets have great potential to become an even more important communication channel in the future. I understand activity streams / changemonger suggests a broken up view of data changes because many changesets are so large that they are effectively not meaningful. I'd like to understand better why these changesets are so large. Unscientifically digging back on the history of today, I'm seeing many many changesets that seem like they could be just as well much smaller - both in the sense of geographic extent and number of elements - I don't want to call anybody out here, but this is what I found: - http://www.openstreetmap.org/browse/changeset/13514072 - http://www.openstreetmap.org/browse/changeset/13523015 - http://www.openstreetmap.org/browse/changeset/13508818 I understand that there will always be cases where a large changeset makes sense (e. g. bot changes), but it seems that we have many unnecessarily large changesets that make changesets a not very useful granularity for looking at data history. My questions - What are the recommendations for change set sizes? - Are there technical reasons why changesets should tend to be large? Are they expensive on some level? - Could editors encourage users to do more and smaller changesets? - What else could be done to encourage smaller changesets with meaningful comments? [1] http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html [2] Click on 'activity' here http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M Alex Barth http://twitter.com/lxbarth tel (+1) 202 250 3633 Alex Barth http://twitter.com/lxbarth tel (+1) 202 250 3633 ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 10/17/2012 01:04 AM, Alex Barth wrote: I really like how activity streams shows easy-to-understand changes on the map using changemonger [1,2]. At the same time it creates an alternative break down of changes that is more granular than changesets. This diverts attention from _comments on changesets_. This is not ideal in my mind - these comments on changesets have great potential to become an even more important communication channel in the future. I agree. I will add changeset comments to changeset descriptions on the demo instance and let's see how this turns out. One challenge I see with that is the fact that some (most?) people don't add relevant information to their changesets. But perhaps seeing their changesets as activities would change that behavior and they would use changeset comments as a communication channel, not as a required field in an editor. I understand activity streams / changemonger suggests a broken up view of data changes because many changesets are so large that they are effectively not meaningful. I'd like to understand better why these changesets are so large. One thing that became immediately apparent once I managed to get the whole thing up and running is the fact that changesets really do come in all shapes and sizes. Is that a problem? I thought about it and my conclusion is that it's just another thing that the social/activity stream view could help with. While I agree with Tom's comment about encouraging people to go out and survey instead of writing edit bots, I think we should accept and embrace all changes when thinking about improvements to the site. Specifically, I thought about adding things like: 1. Changeset size (number of changes) indicator on a single activity view. 2. Changeset size (in terms of bounding box) indicator on a single activity view. 3. Simple filtering features for (1) and (2). Right now the Activity Server holds multipoint geometry for every changeset so it's possible to implement filtering like that (as opposed to considering only the bounding box of a changeset) Paweł ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Why are so many changeset so large?
On 10/17/2012 01:04 AM, Alex Barth wrote: - http://www.openstreetmap.org/browse/changeset/13514072 wheelmap_visitor is sort of a bot, it uploads changes made to the wheelchair=* accessibility tags by anonymous users on http://wheelmap.org/ It only touches that one tag. It generates a new change set every few hours. There is no clear pattern so i assume that it uploads each individual change when it happens and a new changeset is started whenever the previous one timed out. Putting each single change to a wheelmap=* tag into a changeset of its own doesn't seem to make much sense here. See also http://www.openstreetmap.org/user/wheelmap_visitor - http://www.openstreetmap.org/browse/changeset/13523015 Seems to be related to http://lima.schaaltreinen.nl/remap/ that checks for ways with implausible angles between segments These are probably manual edits based on the suggestions from that site, and span a large area as these suggestions were not ordered by region - http://www.openstreetmap.org/browse/changeset/13508818 This one only covers a few objects (three nodes, one way, three areas), all of them uranium mines or related to those. There are not that many uranium mines on the planet so anything touching more than one of them is going to produce a large changeset area. Putting each name change in a changeset of its own wouldn't have made much sense in this case though IMHO -- hartmut ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev