Re: [OSM-dev] using Osmium to filter osh files
Hi! Wow. That's a lot. I suggest you start with something more simple like the amenity tags on nodes and then work your way up. Everything that involves nodes and ways together, you have to read the input file twice and/or store data in between and that makes it more complicated. This is especially complex for history files. Concering Osmium: The new Osmium and its documentation is work in progress, it will take a while for all of that to appear. I fear you'll have to make do with whats there. The handler concept is quite similar though to what the old osmium was doing and there are some links to talks and blog posts on http://wiki.openstreetmap.org/wiki/Osmium that you can read to get a better idea about the general architecture. Basically what happens is that a file is read and a callback on each of the handlers is called in turn to work on the objects as they are read from the file. Jochen On Do, Mai 22, 2014 at 11:53:37 -0400, Abhishek wrote: Date: Thu, 22 May 2014 11:53:37 -0400 From: Abhishek dalek2poi...@gmail.com To: Jochen Topf joc...@remote.org Cc: dev@openstreetmap.org Subject: Re: [OSM-dev] using Osmium to filter osh files definitely. I'm looking to analyze the development of OpenStreetMap in the US with a particular focus on contributor activity -- and trying to understand differences between number of contributors entering in different regions over time and also the *nature* of their contribution activity (adding new streets, amenities, natural features vs. adding tags, fixing existing features etc). As a first pass, I've already used the changeset files (using the middle of the BBOX as an approximate measure of location) to understand user contribution activity. What the changeset files do not allow me to do is understand the *nature* of the contributions. In order to do this, I'm looking at the history files. As a first pass, I would like to build a flatfile (CSV-like) that is edit-level (rather than changeset level) and understand what each edit meant. In particular, I'm interested in classifying edits into the following categories (a) adding new amenity (record name, location of amenity) (b) adding new street (record name of street and approximate location, i.e. midpoint of the way) (c) adding tags to existing street (which tags? maxspeed and oneway are interesting) (d) deleting features (e) other (notably adding natural features etc) So, specifically, one idea might be to have a dataset that records every node added, its location, metadata (user, timestamp etc) and its tags and for every way, reduce it to a point (like osmconvert's all-to-node) and do the same. I'm also open to other suggestions. The algorithm might work as follows: 1. go through every node in the osh file and write it to a csv only if it does not belong to a way (this will capture point features) A node can be part of a way and a point feature at the same time. 2. go through every way, reduce the way to a single point, write the point feature and related metadata to a csv file 3. ignore relations. So this would be something like osmconvert with the options all-to-nodes and drop-relations Any ideas on how should I go about doing this? In terms of the documentation, I've been using the new osmium and looking at osmcode.org, but my sense is that this documentation is not yet complete (for example I cannot find the tag filter classes that you mention) -- are these documented on the Wiki? Its fun to be using a low-level tool like Osmium, but any help would make this process a lot easier for me. Thanks! Abhishek On Thu, May 22, 2014 at 8:40 AM, Jochen Topf joc...@remote.org wrote: On Mi, Mai 21, 2014 at 11:20:18 -0400, Abhishek wrote: I would like to use osmium to filter .osh files. Specifically I wanted to recreate the features of osmfilter, that allows me extract certain features like amenity=* or highway=* along with their relevant histories from a .osh.pbf file. I've managed to successsfully setup osmium and osmium-tool, but I couldnt figure out a way to use these tools to filter features from the history data. I'm very new to writing code in C++, so I was hoping this feature was implemented. Any ideas on where I should be looking for help? Working with the history files is not easy and it very much depends on what you really want to do. In the general case, it is not enough to find, for instance, all ways tagged with highway=*, you have to find the nodes that were used by those ways at the time when those ways were current. If you are only interested in the tags and their history and not the location of those ways, it becomes much easier. So first, you have to understand the details of the OSM data model and how it plays out in the history files. Osmium has many building blocks that you will need, it can read the history files, there are tag filter classes (osmium::tags::KeyFilter and
Re: [OSM-dev] osm2pgsql planet_osm_ways sudden shrinking :(
Doesn't seem to be a general issue http://munin.openstreetmap.org/openstreetmap/yevaud.openstreetmap/postgres_size_gis_9_1_main.html my server http://hz3.poole.ch/munin/localdomain/localhost.localdomain/postgres_size_gis.html Simon Am 23.05.2014 10:47, schrieb Christian Quest: On 2 different OSMFR tile servers we recently got the same problem: planet_osm_ways is suddenly shrinked. We have no idea of what can cause this. It looks like the whole table is truncated. You can have a look for exemple here: http://munin.openstreetmap.fr/free.org/osm13.openstreetmap.fr/postgres_size_ALL.html http://munin.openstreetmap.fr/osm12.free.org/osm105.openstreetmap.fr/postgres_size_ALL.html Are we the only ones to face this strange problem ? -- Christian Quest - OpenStreetMap France ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev signature.asc Description: OpenPGP digital signature ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
[OSM-dev] OSM software repositories -- git and svn
Hi, this is a discussion/brainstorming about if and how to get rid of what remains of the OSM project SVN. Let me first say what I liked about SVN: * I liked that there was a project-wide SVN without any privilege hierarchies. Everyone could ask for an account, and then they could commit changes to everything. * I liked that there was one canonical version of everything in the SVN, and that I could change the canonical version instantly, instead of making my own copy of it and then asking someone to somehow accept my change. * I liked that I could disown stuff into SVN - here's a small script I wrote, I don't intend to maintain it really, but feel free to use/improve it and that others could then contribute to something like that without actually having to take ownership, and without users having to track whose version was currently the one to use. * I liked that things were discoverable - if someone made an obscure script that deals with .poly files then it was very likely that it would be found in the appropriate SVN directory, rather than under a fancy repo name somewhere in github, * I liked that it was ours, and not dependent on the goodwill or cooperation of a third party operator. What I picture here is a, sometimes precarious, version of collective ownership. It is not suitable for large, highly visible project with a solid contributor base - for example, JOSM has always been developed outside of the project SVN even while that was still in vogue. One of the main criticisms of our SVN is that it is full of cruft that doesn't even work anymore, and while there are plenty github repos that share the same fate, this laissez-faire attitude to project ownership might well contribute to that. I would be more likely to clean something up if it was a repository listed under my name in github that something that I dumped into SVN five years ago and have since forgotten. Still I maintain that there is, conceptually, a niche for this shared ownership, where the developer community of the whole project has instant write access to the canonical version of things. Myself, I've written countless little scripts and snippets that are shared in SVN. Sometimes people have indeed made their own changes to them; sometimes my scripts simply live alongside 5 other scripts that are thematically related (e.g. a collection of utilities that all deal with .poly files in one way or the other). The generic approach to something like that today would probably be that every author creates a github repo for his stuff, potentially granting others access if they ask. Git and github are simply the way to go these days, and while some people still shed tears for the good old SVN (or CVS, or... what was it before that, RCS?) times, it doesn't really make much sense to have many different technologies for what is essentially the same basic task of version control. Assuming for a moment that we would like to drop our SVN altogether and replace it by git/github - I'd like to discuss the following: 1. collective ownership Can we maybe have the (fsvo) superior technology offered by git and still not completely drop the collective ownership idea? Can we somehow use git(hub) (without totally ab-using it) to create a niche for stuff that is not quite a standalone project and not necessarily owned by one single person? Or am I maybe the only person in the world who sees some value in that concept? Should everyone who remotely considers themselves an OSM developer have write access to the openstreetmap repository in github, or should we create an openstreetmap-developer repository for that which would have a less official character? Is there maybe a technical way to grant write access to the repository to everyone who has an OSM account without extra signup? Can we continue to support users who, for privacy reasons, don't want to work through the github platform but who would rather only communicate with a server directly under our control? 2. discoverability I know that there are people who create a github repository for everything, even if it's just a 30-line text file to maintain. Doing that for all the conceptually different things that we now have in our SVN would probably yield something between 200 and 500 repositories, and it would be (correct me if I am wrong please) a big step backwards in discoverability because git(hub) repositories cannot be arranged in trees - in SVN I can, for example, go to applications/rendering or applications/utils/export and do an ls there to see. Would that mean that we'd essentially have to create one big repository that can hold a ton of completely separate stuff like our SVN does today? Or would we create hundreds of mini repos and then have a separate index for them, e.g. a wiki page or a 101st repo? Maybe there's some state of the art solution for this kind of problem? 3. moving across old stuff from SVN If we can manage to find a way to give a new home to stuff from SV in the
Re: [OSM-dev] OSM software repositories -- git and svn
Hi, I just like to point out, that there is already a dedicated github account: https://github.com/openstreetmap It hosts for example iD, osm2pgsql, mod_tile and lots of mirrors. Technically, openstreetmap is a github Organization. This means, not a single person owns the account, but a group of so called Owners (including tomhughes and Firefishy). Also you have very convenient interface to assign write and admin access to the members, for individual repositories and globally. See also: https://github.com/openstreetmap/openstreetmap-mirror/blob/master/ABOUT.md On 23.05.2014 15:19, Frederik Ramm wrote: Myself, I've written countless little scripts and snippets that are shared in SVN. Sometimes people have indeed made their own changes to them; sometimes my scripts simply live alongside 5 other scripts that are thematically related (e.g. a collection of utilities that all deal with .poly files in one way or the other). [...] I know that there are people who create a github repository for everything, even if it's just a 30-line text file to maintain. Doing that for all the conceptually different things that we now have in our SVN would probably yield something between 200 and 500 repositories I would guess that it is more typical that people think in terms of named projects. So in most cases, a separate repository would be appropriate. Or would we create hundreds of mini repos and then have a separate index for them, e.g. a wiki page or a 101st repo? That might be the best option. Paul ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM software repositories -- git and svn
Hi, On 05/23/2014 07:31 PM, Paul Hartmann wrote: I just like to point out, that there is already a dedicated github account: https://github.com/openstreetmap It hosts for example iD, osm2pgsql, mod_tile and lots of mirrors. I was aware of that but I'm not sure if people would be happy for a couple hundred people (or even any OSM contributor) to have write access to the lot, and if swamping that with ... Or would we create hundreds of mini repos and then have a separate index for them, e.g. a wiki page or a 101st repo? That might be the best option. ... 100s of repos would be good. There's also the osmlab group account on github which might be a bit more of a free-for-all than the openstreetmap account (README says we are liberal with commit rights). Possibly that one was created in order to not have to be too liberal on the openstreetmap account ;) Bye Frederik -- Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33 ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM software repositories -- git and svn
On Fr, Mai 23, 2014 at 03:19:07 +0200, Frederik Ramm wrote: Should everyone who remotely considers themselves an OSM developer have write access to the openstreetmap repository in github, or should we create an openstreetmap-developer repository for that which would have a less official character? Is there maybe a technical way to grant write access to the repository to everyone who has an OSM account without extra signup? GitHub has an API. It should be pretty easy to create a mini web page somewhere, where everybody can just type in their GitHub account name and the web page will automatically add this account to the committer list for some repository or so. You could combine this with an OAuth authentication to the OSM server if you wanted. An alternative would be to add some hooks that all pull requests to some repository are always automatically merged. So people would still clone the project and work on their own copy, but they could always send a pull request to the master repository which would merge it automatically. This would even work without github. You could also set up some fully automated system where everybody can send pull requests, if you are on the white list it is accepted immediately, if you are on the black list, it will be rejected, and for everybody else you'll land on some list that people can review and if a reviewer clicks on okay, your pull request goes through and you'll be added to the whitelist automatically. That way there is a minimal hurdle, but if you have done something okay once, the system will trust you in the future. Reviewers could be the same people that are on the whitelist, you seed that with a few trusted people and then the system will regulate itself (hopefully). Of course there could be lots of other combinations. You can add code to make sure tests run through before merging etc. The coding needed for these things should be pretty minimal, just some scripts that are run from git or github hooks. The added benefit of doing that in git is that you don't need special accounts like in SVN, because git basically creates accounts out of email addresses. Well, if you use GitHub in there, people need github accounts. But you don't have to have another account list that somebody has to administer like with the current SVN setup. So no password changing hassles etc., thats all done somewhere else. Jochen -- Jochen Topf joc...@remote.org http://www.jochentopf.com/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] OSM software repositories -- git and svn
On Fri, May 23, 2014 at 03:19:07PM +0200, Frederik Ramm wrote: 2. discoverability I know that there are people who create a github repository for everything, even if it's just a 30-line text file to maintain. Doing that for all the conceptually different things that we now have in our SVN would probably yield something between 200 and 500 repositories, and it would be (correct me if I am wrong please) a big step backwards in discoverability because git(hub) repositories cannot be arranged in trees - in SVN I can, for example, go to applications/rendering or applications/utils/export and do an ls there to see. Would that mean that we'd essentially have to create one big repository that can hold a ton of completely separate stuff like our SVN does today? Or would we create hundreds of mini repos and then have a separate index for them, e.g. a wiki page or a 101st repo? Maybe there's some state of the art solution for this kind of problem? Lots of little repositories and a 101st with repo with list of repo URLs sound good to me. This would also allow different ownership/rights for all of the little repos. Why a one-size-fits-all solution when some repos could be free-for-all and some more managed. Just allow everybody to add any repository to your 101st list repo. If you want to, you could ask people to add some standardized meta.json file or so that you can then crawl to build some kind of index to make it even easier to find repos by keyword or whatever. Jochen -- Jochen Topf joc...@remote.org http://www.jochentopf.com/ +49-721-388298 ___ dev mailing list dev@openstreetmap.org https://lists.openstreetmap.org/listinfo/dev