Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
Am 07.03.2012 05:11, schrieb Michael Daines: First, a longstanding wishlist item for OSM has been data tiles, that is the API data, split into preset sized areas (eg z14), which a client could call. This may not seem reelvant to your project but you'll see why it is soon. This was actually part of my original motivation for proposing this project -- in my 2010 GSoC project, I used bbox queries to load data in tile-like sections, but as I mentioned this turned out to be very slow. Data tiles seem like they could speed things up for that sort of use. Ideally, the work involved in accessing a data tile would be comparable to accessing an image tile. Also, it seems easier to cache data addressed by tile than it is to cache the results of arbitrary bbox queries. I'd also be interested in working on data tiles -- is that in itself a reasonable project idea? My hope is that if either of these ideas are things people have been wanting for a while, they'll want to use them, and that if a project has people using it, it would be more likely to be around after the summer. A Service that is able to provide 1. fast and scalable 2. tiled access to 3. updated data 4. around the world with a constant tile size (eg z12 or z14) 5. together with formulars to calculate the tile coordinate from lat/lon and 6. complete documentation would be project of reasonable complexity and usefulness. The most complex part here is 3. If you have further questions on possible implementations or use-cases don't hesitate to contact me directly: pe...@mazdermind.de One thing I was wondering about -- how do you choose a tile size to minimize both the number of accesses (larger tiles) and the byte size of tiles (smaller tiles)? Some areas have a much higher density of data than others. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density? This could be a Project for the next GSoC, but calculating the tile sizes is in itsself so complex, that it would fill a complete GSoC-Project, leaving no room for the project outlined above. But a Tiling-Algorithm without a service implementing it would not be of great use for the community, would it? Despite that there are already tools that are dedicated to this kind of computation: http://www.mkgmap.org.uk/page/tile-splitter The ideas you suggest for streaming-type updates on data tiles are very interesting. If you were writing an editor, you could be more certain that you were displaying the most recent data without having to reload all of it. But you would not want the editor to display those changes without user interaction. Imagine you are drawing a road and around your cursor everything changes shapes the whole time. You would not call this a good user experience, would you? Also a streaming editor is nothing the community is requesting, editing works good (enough) the way it is. Peter ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
On 3/7/2012 4:57 AM, Peter Körner wrote: A Service that is able to provide 1. fast and scalable 2. tiled access to 3. updated data 4. around the world with a constant tile size (eg z12 or z14) 5. together with formulars to calculate the tile coordinate from lat/lon and 6. complete documentation I would expand 6 to be documentation for use as well as the ability to replicate the server environment using OSM planet data update feeds. I personally expect the restrictions on the tile servers to be extended to the API servers when enough application coders implement a way to use the API directly from thousands or millions of clients at which point they'll be instructed to fire up their own server and need more than just use-based documentation. Lynn (D) - KJ4ERJ - Trying to get my own tile server working reliably now... ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
We could take this off-list but I think this may still be of interest to the general community. On Tue, Mar 6, 2012 at 11:11 PM, Michael Daines mich...@mdaines.com wrote: First, a longstanding wishlist item for OSM has been data tiles, that is the API data, split into preset sized areas (eg z14), which a client could call. This may not seem reelvant to your project but you'll see why it is soon. This was actually part of my original motivation for proposing this project -- in my 2010 GSoC project, I used bbox queries to load data in tile-like sections, but as I mentioned this turned out to be very slow. There are a number of ways to do this intelligently. I was going to write up a very naive prototype that had no brains at all, and here's what my approach was going to look like (and I'll do it if there's interest): Write some code to query jaxpi for bounding boxes in Python based on tile name. Use this and write Data tile support in TileStache. I'd store cached tiles in Redis (for reasons that become apparent in a few sentences). I'd use the parsing/storing bits of Changepipe to tell me which tiles are effected by a changeset (even though I believe it uses the changeset's bbox, which is oftentimes wrong). Since Changepipe is already using Redis, using Redis for the tiles makes sense. And then the issue would be how to hack in some code for the websocket/stream/whatever. This seems like it'd be relatively simple using Redis pubsub and something like gevent, but I haven't looked into it. The right answer would be to keep a local copy of the database and then update it as necessary. I believe Ian Dees has a copy of some MongoDB code that uses quadtile to index OSM objects (I'm very fuzzy on the details). (Update, Ian sent me this url, but I haven't taken a look: https://github.com/iandees/mongosm/commit/c46c2081edde0b3b2b0446dd06d5ef02b292631c ) Then as objects would change, you'd be able to update the tiles. I'd also be interested in working on data tiles -- is that in itself a reasonable project idea? I think that would be welcome. Especially if done well. My naive approach would be slow, but if you used a different approach that didn't keep hitting external servers on every update, it'd be a very nifty project indeed. One thing I was wondering about -- how do you choose a tile size to minimize both the number of accesses (larger tiles) and the byte size of tiles (smaller tiles)? Some areas have a much higher density of data than others. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density? That'd certainly work. I'd started with a naive approach of If I only have one zoom level, things are easy, and then you just accept that some areas are dense, and others not. At the same time, there won't be as much demand for low density areas. There's certainly value in cleverness and not transmitting too much data, but there's also value in simplicity for clients. I think with compression or binary formats like pbf, the need for cleverness is reduced since there's overall less data transmitted. - Serge ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
I would expand 6 to be documentation for use as well as the ability to replicate the server environment using OSM planet data update feeds. I personally expect the restrictions on the tile servers to be extended to the API servers when enough application coders implement a way to use the API directly from thousands or millions of clients at which point they'll be instructed to fire up their own server and need more than just use-based documentation. Definitely. Customization would also be important, since I imagine that users would like to avoid storing a bunch of data they're not going to use in their project. (Depends on how things are implemented, though.) If your project only dealt with a single city, it might make things easier to set up if you knew the storage or computation requirements were less onerous than dealing with the entire world. -- Michael ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
Write some code to query jaxpi for bounding boxes in Python based on tile name. Use this and write Data tile support in TileStache. I'd store cached tiles in Redis (for reasons that become apparent in a few sentences). I'd use the parsing/storing bits of Changepipe to tell me which tiles are effected by a changeset (even though I believe it uses the changeset's bbox, which is oftentimes wrong). Since Changepipe is already using Redis, using Redis for the tiles makes sense. And then the issue would be how to hack in some code for the websocket/stream/whatever. This seems like it'd be relatively simple using Redis pubsub and something like gevent, but I haven't looked into it. Do I have this right: the server in this implementation would act as sort of a fast, tile-addressed cache for data available through XAPI or similar? The right answer would be to keep a local copy of the database and then update it as necessary. I believe Ian Dees has a copy of some MongoDB code that uses quadtile to index OSM objects (I'm very fuzzy on the details). (Update, Ian sent me this url, but I haven't taken a look: https://github.com/iandees/mongosm/commit/c46c2081edde0b3b2b0446dd06d5ef02b292631c ) Then as objects would change, you'd be able to update the tiles. It looks like mongosm includes an implementation of a data tile server? The quadtile indexing is interesting in that you use only a single parameter to refer to tiles, rather than the z/x/y triple commonly used with image tiles. Keeping a local copy seems simpler and more reliable, but you have to store all the data... This is where I see some kind of customization as being useful -- if you were running your own server, and were only interested in a single city, or only interested in roads and building shapes, you could store just that data. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density? That'd certainly work. I'd started with a naive approach of If I only have one zoom level, things are easy, and then you just accept that some areas are dense, and others not. At the same time, there won't be as much demand for low density areas. There's certainly value in cleverness and not transmitting too much data, but there's also value in simplicity for clients. Particularly for GSoC I think I'd want to err on the side of simplicity. The zoom level could always be adjusted? ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
On Tue, Mar 6, 2012 at 2:10 PM, Michael Daines mich...@mdaines.com wrote: When you mention changes to large relations and widely dispersed objects, I was wondering if you had any specific use cases in mind? I'd also be interested in hearing what kind of expressions you might expect to be able to use. For example, I was thinking you could say something like give me updates for things with the tag highway=residential and is_in=Canada. I was thinking about very long routes, and country borders. If I want to monitor changes to my state and interstate routes within the state, I don't have any good options at the moment to do that. I don't think it should be terribly difficult to implement, I'm just not sure how well it will scale. To make it more interesting you could allow for watching all changes to objects with certain tags that are within a certain distance of a route relation, or located inside a multipolygon. -Josh ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
On Mon, Mar 5, 2012 at 10:58 PM, Michael Daines mich...@mdaines.com wrote: Hi everyone, I'm writing to seek opinions about a possible Google Summer of Code project. I did GSoC in 2010, and I'd like to apply again this year. My project in 2010 was a simplified, web-based map editor. Since the wiki page for project ideas mentions that proposals for the development of existing OSM infrastructure would be preferred, I was having a look at the API v0.7 page, and noticed some interest in a monitoring feature. My proposal is to build a monitoring service to augment the existing API, similar to the Twitter streaming API [1]. Users would request to receive map updates matching tags or which involve elements in some area, and updates would be sent either over a persistent connection (as Twitter does) or possibly by making requests to an endpoint specified by the user. My general idea for the architecture is basically to grab diffs and then send the relevant parts to clients depending on what they've asked to receive. Clients of such a monitoring service could do things like send daily email updates on map activity to users interested in a specific area or tag, invalidate tiles in custom-rendered maps, or assemble a subset of available OSM data for fast, up-to-date querying within that subset (a single city, for example) without worrying about making lots of requests to the OSM API. That third application would be useful for solving one of the problems I ran into with my 2010 project -- I was optimistically loading map data with bbox queries as the user panned the map, which was too slow on the production API to be practical (and probably isn't what that part of the API is really meant for). Another project idea might be to work directly on a service which would provide fast querying on tag or area subsets. However, the project as I've proposed it above seems to me to be sort of a generalization of that, and also seems like it would require less bandwidth and disk space. I wouldn't worry about monitoring area changes, as we have OWL[0] (supposedly being integrated with the Rails port), Changepipe[1], and possibly others that do this already. I'd suggest you consider focusing on the idea of monitoring for changes based on tags and object IDs. I've been interested in changes to some large relations, and other widely dispersed objects, which isn't addressed by any of the current tools. Integration with Rails would be great, so we can Watch any object directly from the website. Of course performance would have to be considered before implementing such a service went live, but I don't think that's terribly important for a GSoC project. -Josh [0]: http://wiki.openstreetmap.org/wiki/OWL_%28OpenStreetMap_Watch_List%29 [1]: https://github.com/migurski/Changepipe ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
One of the larger criticisms of GSoC is that the projects are often abandoned after the summer. Therefore I'd suggest that if you're going to work on something, you work on adding a feature to an existing OSM project, rather than going off and creating a new project. As Josh points out, there are several similar projects out there that monitor areas, so why not add the features you want to one of the existing projects. - Serge ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
Michael, I think Serge's advice is good. I had a go at putting down an order of preference for how we should select GSoC Projects at the top of the ideas pagehttp://wiki.openstreetmap.org/wiki/GSoC_Project_Ideas_2012#Types_of_Projects. I am proposing that we tend to favour projects that are based on developing existing OSM related projects, rather than starting new ones. Please add your idea to the wiki page though, and have a look at which tool you may incorporate the idea into. Regards Graham. On 6 March 2012 15:08, Serge Wroclawski emac...@gmail.com wrote: One of the larger criticisms of GSoC is that the projects are often abandoned after the summer. Therefore I'd suggest that if you're going to work on something, you work on adding a feature to an existing OSM project, rather than going off and creating a new project. As Josh points out, there are several similar projects out there that monitor areas, so why not add the features you want to one of the existing projects. - Serge ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev -- Graham Jones Hartlepool, UK. ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
I wouldn't worry about monitoring area changes, as we have OWL[0] (supposedly being integrated with the Rails port), Changepipe[1], and possibly others that do this already. I'd suggest you consider focusing on the idea of monitoring for changes based on tags and object IDs. I've been interested in changes to some large relations, and other widely dispersed objects, which isn't addressed by any of the current tools. Integration with Rails would be great, so we can Watch any object directly from the website. Of course performance would have to be considered before implementing such a service went live, but I don't think that's terribly important for a GSoC project. When you mention changes to large relations and widely dispersed objects, I was wondering if you had any specific use cases in mind? I'd also be interested in hearing what kind of expressions you might expect to be able to use. For example, I was thinking you could say something like give me updates for things with the tag highway=residential and is_in=Canada. It looks like Changepipe is similar to what I'm proposing, and OWL is sort of the opposite. My idea is that clients would tell the API they're interested in hearing about something (a tag, an id, some expression involving multiple tags, an area) and then updates would be sent to them as they happen, instead of polling for what's happened in the past. This scheme would reduce the number of incoming requests at the cost of the client being responsible for receiving the information and doing something with it. I believe this approach would reduce the complexity of processing new information since it would be known in advance what updates are required. Another approach would be that clients tell the API they're interested in getting information about something and can then request an RSS feed which has recent updates. But instead of requesting an RSS feed for an arbitrary watch expression, it's this sort of bin which stuff is thrown into as the map is updated. This seems easier to translate into something you can do manually with a web browser than what's described above. I'm curious to hear which of these approaches would be useful to people interested in this sort of thing. It seems like being able to ask for RSS feeds would be more immediately useful, but having data pushed to clients would allow for more flexible applications. -- Michael ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
Re: [OSM-dev] Possible GSoC project: tag/area monitoring service
First, a longstanding wishlist item for OSM has been data tiles, that is the API data, split into preset sized areas (eg z14), which a client could call. This may not seem reelvant to your project but you'll see why it is soon. This was actually part of my original motivation for proposing this project -- in my 2010 GSoC project, I used bbox queries to load data in tile-like sections, but as I mentioned this turned out to be very slow. Data tiles seem like they could speed things up for that sort of use. Ideally, the work involved in accessing a data tile would be comparable to accessing an image tile. Also, it seems easier to cache data addressed by tile than it is to cache the results of arbitrary bbox queries. I'd also be interested in working on data tiles -- is that in itself a reasonable project idea? My hope is that if either of these ideas are things people have been wanting for a while, they'll want to use them, and that if a project has people using it, it would be more likely to be around after the summer. One thing I was wondering about -- how do you choose a tile size to minimize both the number of accesses (larger tiles) and the byte size of tiles (smaller tiles)? Some areas have a much higher density of data than others. Perhaps some kind of quadtree-type approach could be used, where tiles are split if they have high density? The ideas you suggest for streaming-type updates on data tiles are very interesting. If you were writing an editor, you could be more certain that you were displaying the most recent data without having to reload all of it. While you could use Changepipe to make arbitrary polygons and then stream the changes, IMHO this is not as generally useful as one might imagine. Network hiccups alone can mean that it's possible to miss an event. And arbitrary polygons become complicated as the number of queues can be large. I hadn't thought about using arbitrary polygons to specify areas as it seemed too complex -- would there be much call for that? I assume the use cases would be things like keeping track of updates to a city (the area of which isn't always conveniently specified as a bounding box). By splitting the areas up, you can now take a changeset and know which areas (tile) it effects. And then each client can simply subscribe to an area (tile). You've greatly simplified the problem, whether you allow for arbitrary shapes (one shape - many tiles) or 1:1 tiles to connections. Now, to your original question... Another advantage of tiling the data is you can easily do both. Each tile can have a list of changes associated with it. If you tried to do this on arbitrary polygons, it'd get difficult very quickly. This makes sense, as I guess it means there are fewer bins to put things in when an update needs to be sent out to clients. (You only have to do the work once if several clients are looking at a particular tile.) And, if a client really did want to look at an arbitrary polygon, maybe it could rasterize the polygon into a list of tiles. For people who are interested in updates to tags, a similar approach could be used, perhaps -- in that case I guess a tile would be analogous to a particular value or set of values for a tag. -- Michael ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev
[OSM-dev] Possible GSoC project: tag/area monitoring service
Hi everyone, I'm writing to seek opinions about a possible Google Summer of Code project. I did GSoC in 2010, and I'd like to apply again this year. My project in 2010 was a simplified, web-based map editor. Since the wiki page for project ideas mentions that proposals for the development of existing OSM infrastructure would be preferred, I was having a look at the API v0.7 page, and noticed some interest in a monitoring feature. My proposal is to build a monitoring service to augment the existing API, similar to the Twitter streaming API [1]. Users would request to receive map updates matching tags or which involve elements in some area, and updates would be sent either over a persistent connection (as Twitter does) or possibly by making requests to an endpoint specified by the user. My general idea for the architecture is basically to grab diffs and then send the relevant parts to clients depending on what they've asked to receive. Clients of such a monitoring service could do things like send daily email updates on map activity to users interested in a specific area or tag, invalidate tiles in custom-rendered maps, or assemble a subset of available OSM data for fast, up-to-date querying within that subset (a single city, for example) without worrying about making lots of requests to the OSM API. That third application would be useful for solving one of the problems I ran into with my 2010 project -- I was optimistically loading map data with bbox queries as the user panned the map, which was too slow on the production API to be practical (and probably isn't what that part of the API is really meant for). Another project idea might be to work directly on a service which would provide fast querying on tag or area subsets. However, the project as I've proposed it above seems to me to be sort of a generalization of that, and also seems like it would require less bandwidth and disk space. Thanks for taking a look, and please let me know what you think! -- Michael Daines [1] https://dev.twitter.com/docs/streaming-api ___ dev mailing list dev@openstreetmap.org http://lists.openstreetmap.org/listinfo/dev