Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Paweł Paprota

On 10/17/2012 07:43 AM, Paweł Paprota wrote:


I agree. I will add changeset comments to changeset descriptions on the
demo instance and let's see how this turns out.


I said that but then I remembered that changeset metadata is not 
available in the replication feed - only through public API or the 
weekly dump of all changesets.


This is a complication. I need to think how to structure the deployment 
of this whole thing. Right now there are some dependencies (PostGIS 
database, replication feed) that may not be needed in the future.


I will try to start a discussion about it this week.

Paweł

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Matt Amos
On Wed, 2012-10-17 at 00:28 +0100, Tom Hughes wrote:
 On 17/10/12 00:04, Alex Barth wrote: 
  - Are there technical reasons why changesets should tend to be 
  large? Are they expensive on some level?
 
 I believe it's entirely because we've got so many people doing 
 mechanical or semi-mechanical edits.
 
 That includes bots but also things like people using xapi or overpass to 
 download all objects matching some set of tags, then change those tags 
 and reupload.

the historical answer to this is that when changesets were added to the
OSM API there were two different intentions for their use which got
conflated: first, that changesets were structures for grouping edits
sharing common attributes. and second, that changesets were VCS-style
'commits' which would be uploaded in a single request and applied
atomically.

effectively, the first use-case was for users, and tried to make
changesets as open-ended as possible. from this, we get tags on
changesets for comments, editor, bot-ness, etc... and the ability to
keep uploading into an open changeset.

the second use-case was a technical thing - the sheer number of API
calls to individual elements, even from normal-sized editing sessions,
could cause problems. and, for small calls, HTTP headers and round-trip
latencies would dominate the cost of an upload. further, editors had to
cope with the situation where an upload failed half-way through and to
re-try the failed calls. from this, we get a single changeset/#id/upload
call which applies atomically.

at the time, this seemed like a good way to satisfy both use-cases. and,
while it does what it set out to, i think we should consider splitting
these in the next API version; explicitly reifying uploads at which
bboxes / coverage sets and change counts can be stored. changesets can
then simply be collections of uploads.

getting to the point: this might to some extent mitigate the large
changesets issue, as it would allow bboxes to be collected at a smaller
granularity. however, it wouldn't be a full solution and we'd probably
still need something like OWL to break down the geographic footprint of
changesets further.

cheers,

matt



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Andy Allan
On 17 October 2012 13:53, Matt Amos zerebub...@gmail.com wrote:

 getting to the point: this might to some extent mitigate the large
 changesets issue, as it would allow bboxes to be collected at a smaller
 granularity. however, it wouldn't be a full solution and we'd probably
 still need something like OWL to break down the geographic footprint of
 changesets further.

Further to this, I find this changeset extent problem is often
caused by looking at things the wrong way around. If you want to find
out what area the changeset covers, then we supply a bounding box to
help. However, if you want to know which changesets affect a given
area, this reverse question is much less easily answered. Hence OWL,
etc.

Beyond that, the extent is more of a promise that there are no edits
on the outside, rather than any guide to what's within. No changeset
completely fills, nor even evenly fills, its extent. There is a
widespread and very shakey assumption that smaller changesets are
somehow more likely to be rectangular or have a more even distribution
across themselves, but this won't hold in the real world in pretty
much any circumstances[1].

Basically, I see no need to worry about the extent of bounding boxes,
and no need to move to having bboxes on uploads instead of changesets
or other complications. No matter what we do, if your interest in a
changeset extends beyond the details of its extent, you need a
mechanism (again, e.g. OWL) to detail the actual locations of the
edits to the entities, and different interests (and different
entities) will have even have different buffers of interest around
them. Lets focus on things like that.

Cheers,
Andy

[1] Unless we all live in cities with north/south street grids and map
each city block in individual changesets :-)

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Paweł Paprota

On 10/17/2012 03:30 PM, Andy Allan wrote:


Basically, I see no need to worry about the extent of bounding boxes,
and no need to move to having bboxes on uploads instead of changesets
or other complications. No matter what we do, if your interest in a
changeset extends beyond the details of its extent, you need a
mechanism (again, e.g. OWL) to detail the actual locations of the
edits to the entities, and different interests (and different
entities) will have even have different buffers of interest around
them. Lets focus on things like that.



Exactly. What I do right now with the Activity Server is I store the 
whole geometry of a changeset. When a bounding box query comes, I use 
ST_Intersects between the bbox and geometries. This has the desired 
effect you write about: that is, with a changeset that contains changes 
in Sydney and in Canada, you will only get it in the query result for 
those two places, not for anywhere in the world like it is right now in 
the History tab.


I am bit concerned about scalability of this, Matt clearly stated in one 
of the earlier discussions that dumping every changeset to one table 
won't scale.


I'm now looking to dig into OWL's code and see how my work relates to it 
- I think it potentially could make sense to somehow bring the two 
projects together or at least integrate them at some level (OWL 
publishing activities to the Activity Server?).


Paweł


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Tom Hughes

On 17/10/12 17:20, Alex Barth wrote:


Matt Amos wrote:


from this, we get a single changeset/#id/upload
call which applies atomically.


Is that so? I thought changesets were not applied atomically leading to issues 
where it is hard to find out what data got applied when a connection breaks 
down or an editor crashes.


A changeset isn't atomic, but an upload should be as it is done in a 
transaction. The changeset isn't atomic because it may have multiple 
uploads grouped in the same changeset.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Paweł Paprota

On 10/17/2012 06:20 PM, Alex Barth wrote:

It seems that OWL and Activity Streams have the exact same problem here...


I have been talking with Matt today on IRC and to me it looks like we 
have been asking ourselves the same questions and overall I think that 
replacing a big chunk of the Changeset Activity Publisher [1] that I've 
developed with OWL is the right thing to do.


At this point I want to spend a few days familiarizing myself with OWL 
code base to see what's the current status and how does it fit into the 
whole Activity Streams picture.


[1] 
https://github.com/ppawel/osm-activity-publishers/blob/master/changeset-publisher/


Paweł

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Paul Norman
 From: Alex Barth [mailto:a...@mapbox.com]
 Subject: Re: [OSM-dev] Why are so many changeset so large?
 
 BTW, I did some cursory digging in the changesets dump and found that
 actually only a relatively small percentage of changesets are
 geographically large. Trying to use the history tab they seem to be more
 numerous. I don't have numbers yet, but I hope I can share some soon.

The issue is that you see every large changeset. This is most obvious in areas 
with no editing like the middle of the ocean. Looking at the size of an average 
changeset weighted by changeset size might produce data that comes closer to 
what you see in the history tab. The problem is then that you don't care about 
the history tab for most of the world, only where people or mappable features 
are.

Incidentally, it's possible to make a changeset that only touches a small area 
but has a larger bbox with the expand_bbox call - see 
http://wiki.openstreetmap.org/wiki/API_v0.6#Expand_Bounding_Box:_POST_.2Fapi.2F0.6.2Fchangeset.2F.23id.2Fexpand_bbox

I've never used it myself.


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-17 Thread Michael Kugelmann

On 17.10.2012 09:15, Jochen Topf wrote:

I think one reason people add bad changeset comments and organize their
changesets in a bad way is that for most people those changesets and the
comments just disappear into a black hole.
One thing that is also bad in my point of view ist that you can't edit 
the comment on the changeset any more. So if you are e.g. too fast in 
JOSM with key pressing/clicking OK you have a wrong comment there which 
you never can correct (at least I'm not aware how to do it).



Best regards,
Michael.


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Alex Barth

I really like how activity streams shows easy-to-understand changes on the map 
using changemonger [1,2]. At the same time it creates an alternative break down 
of changes that is more granular than changesets. This diverts attention from 
_comments on changesets_. This is not ideal in my mind - these comments on 
changesets have great potential to become an even more important communication 
channel in the future.

I understand activity streams / changemonger suggests a broken up view of data 
changes because many changesets are so large that they are effectively not 
meaningful. I'd like to understand better why these changesets are so large.

Unscientifically digging back on the history of today, I'm seeing many many 
changesets that seem like they could be just as well much smaller - both in the 
sense of geographic extent and number of elements - I don't want to call 
anybody out here, but this is what I found:

- http://www.openstreetmap.org/browse/changeset/13514072
- http://www.openstreetmap.org/browse/changeset/13523015
- http://www.openstreetmap.org/browse/changeset/13508818

I understand that there will always be cases where a large changeset makes 
sense (e. g. bot changes), but it seems that we have many unnecessarily large 
changesets that make changesets a not very useful granularity for looking at 
data history.

My questions

- What are the recommendations for change set sizes?
- Are there technical reasons why changesets should tend to be large? Are they 
expensive on some level?
- Could editors encourage users to do more and smaller changesets?
- What else could be done to encourage smaller changesets with meaningful 
comments?

[1] http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html
[2] Click on 'activity' here 
http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M

Alex Barth
http://twitter.com/lxbarth
tel (+1) 202 250 3633





___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Tom Hughes

On 17/10/12 00:04, Alex Barth wrote:


- What are the recommendations for change set sizes?


Personally I tend to put everything that is logically grouped together 
in one changeset where possible.


But by that I mean that I'll spend a few hours out collecting data in a 
small area and then probably upload that in one changeset - sometimes 
more than one if I take a break while editing and it times out.



- Are there technical reasons why changesets should tend to be large? Are they 
expensive on some level?


I believe it's entirely because we've got so many people doing 
mechanical or semi-mechanical edits.


That includes bots but also things like people using xapi or overpass to 
download all objects matching some set of tags, then change those tags 
and reupload.



- Could editors encourage users to do more and smaller changesets?
- What else could be done to encourage smaller changesets with meaningful 
comments?


Encouraging people to go out and do actual local survey based mapping 
instead of trying to enforce their tagging ideas on the whole world with 
mass edits.


Tom

--
Tom Hughes (t...@compton.nu)
http://compton.nu/

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Eugene Alvin Villar
Hi Alex,

What do you mean by large? Do you mean changesets that span a large
area (spanning whole continents)? Or changesets that have a lot of
objects modified (perhaps more than 1000)?

Based on the examples you provided, it seems you mean the former. Is
this correct?

Eugene


On Wed, Oct 17, 2012 at 7:04 AM, Alex Barth a...@mapbox.com wrote:

 I really like how activity streams shows easy-to-understand changes on the 
 map using changemonger [1,2]. At the same time it creates an alternative 
 break down of changes that is more granular than changesets. This diverts 
 attention from _comments on changesets_. This is not ideal in my mind - these 
 comments on changesets have great potential to become an even more important 
 communication channel in the future.

 I understand activity streams / changemonger suggests a broken up view of 
 data changes because many changesets are so large that they are effectively 
 not meaningful. I'd like to understand better why these changesets are so 
 large.

 Unscientifically digging back on the history of today, I'm seeing many many 
 changesets that seem like they could be just as well much smaller - both in 
 the sense of geographic extent and number of elements - I don't want to call 
 anybody out here, but this is what I found:

 - http://www.openstreetmap.org/browse/changeset/13514072
 - http://www.openstreetmap.org/browse/changeset/13523015
 - http://www.openstreetmap.org/browse/changeset/13508818

 I understand that there will always be cases where a large changeset makes 
 sense (e. g. bot changes), but it seems that we have many unnecessarily large 
 changesets that make changesets a not very useful granularity for looking at 
 data history.

 My questions

 - What are the recommendations for change set sizes?
 - Are there technical reasons why changesets should tend to be large? Are 
 they expensive on some level?
 - Could editors encourage users to do more and smaller changesets?
 - What else could be done to encourage smaller changesets with meaningful 
 comments?

 [1] 
 http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html
 [2] Click on 'activity' here 
 http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M

 Alex Barth
 http://twitter.com/lxbarth
 tel (+1) 202 250 3633

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Alex Barth

Eugene - right, I mean changesets that are geographically large. 

On Oct 16, 2012, at 8:03 PM, Eugene Alvin Villar sea...@gmail.com wrote:

 Hi Alex,
 
 What do you mean by large? Do you mean changesets that span a large
 area (spanning whole continents)? Or changesets that have a lot of
 objects modified (perhaps more than 1000)?
 
 Based on the examples you provided, it seems you mean the former. Is
 this correct?
 
 Eugene
 
 
 On Wed, Oct 17, 2012 at 7:04 AM, Alex Barth a...@mapbox.com wrote:
 
 I really like how activity streams shows easy-to-understand changes on the 
 map using changemonger [1,2]. At the same time it creates an alternative 
 break down of changes that is more granular than changesets. This diverts 
 attention from _comments on changesets_. This is not ideal in my mind - 
 these comments on changesets have great potential to become an even more 
 important communication channel in the future.
 
 I understand activity streams / changemonger suggests a broken up view of 
 data changes because many changesets are so large that they are effectively 
 not meaningful. I'd like to understand better why these changesets are so 
 large.
 
 Unscientifically digging back on the history of today, I'm seeing many many 
 changesets that seem like they could be just as well much smaller - both in 
 the sense of geographic extent and number of elements - I don't want to call 
 anybody out here, but this is what I found:
 
 - http://www.openstreetmap.org/browse/changeset/13514072
 - http://www.openstreetmap.org/browse/changeset/13523015
 - http://www.openstreetmap.org/browse/changeset/13508818
 
 I understand that there will always be cases where a large changeset makes 
 sense (e. g. bot changes), but it seems that we have many unnecessarily 
 large changesets that make changesets a not very useful granularity for 
 looking at data history.
 
 My questions
 
 - What are the recommendations for change set sizes?
 - Are there technical reasons why changesets should tend to be large? Are 
 they expensive on some level?
 - Could editors encourage users to do more and smaller changesets?
 - What else could be done to encourage smaller changesets with meaningful 
 comments?
 
 [1] 
 http://lists.openstreetmap.org/pipermail/rails-dev/2012-October/001086.html
 [2] Click on 'activity' here 
 http://suncobalt.dyndns.org:8081/?lat=51.61lon=22.44zoom=7layers=M
 
 Alex Barth
 http://twitter.com/lxbarth
 tel (+1) 202 250 3633

Alex Barth
http://twitter.com/lxbarth
tel (+1) 202 250 3633





___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Paweł Paprota

On 10/17/2012 01:04 AM, Alex Barth wrote:


I really like how activity streams shows easy-to-understand changes
on the map using changemonger [1,2]. At the same time it creates an
alternative break down of changes that is more granular than
changesets.



This diverts attention from _comments on changesets_. This is not
ideal in my mind - these comments on changesets have great potential
to become an even more important communication channel in the
future.



I agree. I will add changeset comments to changeset descriptions on the 
demo instance and let's see how this turns out.


One challenge I see with that is the fact that some (most?) people don't 
add relevant information to their changesets. But perhaps seeing their 
changesets as activities would change that behavior and they would use 
changeset comments as a communication channel, not as a required field 
in an editor.



I understand activity streams / changemonger suggests a broken up
view of data changes because many changesets are so large that they
are effectively not meaningful. I'd like to understand better why
these changesets are so large.



One thing that became immediately apparent once I managed to get the 
whole thing up and running is the fact that changesets really do come in 
all shapes and sizes.


Is that a problem? I thought about it and my conclusion is that it's 
just another thing that the social/activity stream view could help with.


While I agree with Tom's comment about encouraging people to go out and 
survey instead of writing edit bots, I think we should accept and 
embrace all changes when thinking about improvements to the site.


Specifically, I thought about adding things like:

1. Changeset size (number of changes) indicator on a single activity view.

2. Changeset size (in terms of bounding box) indicator on a single 
activity view.


3. Simple filtering features for (1) and (2). Right now the Activity 
Server holds multipoint geometry for every changeset so it's possible to 
implement filtering like that (as opposed to considering only the 
bounding box of a changeset)


Paweł

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Why are so many changeset so large?

2012-10-16 Thread Hartmut Holzgraefe

On 10/17/2012 01:04 AM, Alex Barth wrote:


- http://www.openstreetmap.org/browse/changeset/13514072


wheelmap_visitor is sort of a bot, it uploads changes made
to the wheelchair=* accessibility tags by anonymous users
on http://wheelmap.org/

It only touches that one tag. It generates a new change
set every few hours. There is no clear pattern so i assume
that it uploads each individual change when it happens and
a new changeset is started whenever the previous one timed
out.

Putting each single change to a wheelmap=* tag into a
changeset of its own doesn't seem to make much sense here.

See also http://www.openstreetmap.org/user/wheelmap_visitor


- http://www.openstreetmap.org/browse/changeset/13523015


Seems to be related to  http://lima.schaaltreinen.nl/remap/
that checks for ways with implausible angles between segments

These are probably manual edits based on the suggestions
from that site, and span a large area as these suggestions
were not ordered by region


- http://www.openstreetmap.org/browse/changeset/13508818


This one only covers a few objects (three nodes, one way,
three areas), all of them uranium mines or related to
those. There are not that many uranium mines on the planet
so anything touching more than one of them is going to
produce a large changeset area.

Putting each name change in a changeset of its own wouldn't
have made much sense in this case though IMHO

--
hartmut

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev