Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Peter Körner

Am 07.03.2012 05:11, schrieb Michael Daines:

First, a longstanding wishlist item for OSM has been data tiles,
that is the API data, split into preset sized areas (eg z14), which a
client could call. This may not seem reelvant to your project but
you'll see why it is soon.


This was actually part of my original motivation for proposing this project -- 
in my 2010 GSoC project, I used bbox queries to load data in tile-like 
sections, but as I mentioned this turned out to be very slow. Data tiles seem 
like they could speed things up for that sort of use. Ideally, the work 
involved in accessing a data tile would be comparable to accessing an image 
tile. Also, it seems easier to cache data addressed by tile than it is to cache 
the results of arbitrary bbox queries.

I'd also be interested in working on data tiles -- is that in itself a 
reasonable project idea? My hope is that if either of these ideas are things 
people have been wanting for a while, they'll want to use them, and that if a 
project has people using it, it would be more likely to be around after the 
summer.


A Service that is able to provide
1. fast and scalable
2. tiled access to
3. updated data
4. around the world with a constant tile size (eg z12 or z14)
5. together with formulars to calculate the tile coordinate from
   lat/lon and
6. complete documentation

would be project of reasonable complexity and usefulness.
The most complex part here is 3.
If you have further questions on possible implementations or use-cases 
don't hesitate to contact me directly: pe...@mazdermind.de



One thing I was wondering about -- how do you choose a tile size to minimize 
both the number of accesses (larger tiles) and the byte size of tiles (smaller 
tiles)? Some areas have a much higher density of data than others. Perhaps some 
kind of quadtree-type approach could be used, where tiles are split if they 
have high density?
This could be a Project for the next GSoC, but calculating the tile 
sizes is in itsself so complex, that it would fill a complete 
GSoC-Project, leaving no room for the project outlined above. But a 
Tiling-Algorithm without a service implementing it would not be of great 
use for the community, would it?


Despite that there are already tools that are dedicated to this kind of 
computation: http://www.mkgmap.org.uk/page/tile-splitter



The ideas you suggest for streaming-type updates on data tiles are very 
interesting. If you were writing an editor, you could be more certain that you 
were displaying the most recent data without having to reload all of it.
But you would not want the editor to display those changes without user 
interaction. Imagine you are drawing a road and around your cursor 
everything changes shapes the whole time. You would not call this a good 
user experience, would you?


Also a streaming editor is nothing the community is requesting, editing 
works good (enough) the way it is.


Peter

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Lynn W. Deffenbaugh (Mr)

On 3/7/2012 4:57 AM, Peter Körner wrote:


A Service that is able to provide
1. fast and scalable
2. tiled access to
3. updated data
4. around the world with a constant tile size (eg z12 or z14)
5. together with formulars to calculate the tile coordinate from
   lat/lon and
6. complete documentation


I would expand 6 to be documentation for use as well as the ability to 
replicate the server environment using OSM planet data update feeds.  I 
personally expect the restrictions on the tile servers to be extended to 
the API servers when enough application coders implement a way to use 
the API directly from thousands or millions of clients at which point 
they'll be instructed to fire up their own server and need more than 
just use-based documentation.


Lynn (D) - KJ4ERJ - Trying to get my own tile server working reliably now...



___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Serge Wroclawski
We could take this off-list but I think this may still be of interest
to the general community.

On Tue, Mar 6, 2012 at 11:11 PM, Michael Daines mich...@mdaines.com wrote:
 First, a longstanding wishlist item for OSM has been data tiles,
 that is the API data, split into preset sized areas (eg z14), which a
 client could call. This may not seem reelvant to your project but
 you'll see why it is soon.

 This was actually part of my original motivation for proposing this project 
 -- in my 2010 GSoC project, I used bbox queries to load data in tile-like 
 sections, but as I mentioned this turned out to be very slow.

There are a number of ways to do this intelligently. I was going to
write up a very naive prototype that had no brains at all, and here's
what my approach was going to look like (and I'll do it if there's
interest):

Write some code to query jaxpi for bounding boxes in Python based on tile name.
Use this and write Data tile support in TileStache. I'd store cached
tiles in Redis (for reasons that become apparent in a few sentences).
I'd use the parsing/storing bits of Changepipe to tell me which tiles
are effected by a changeset (even though I believe it uses the
changeset's bbox, which is oftentimes wrong).
Since Changepipe is already using Redis, using Redis for the tiles makes sense.

And then the issue would be how to hack in some code for the
websocket/stream/whatever. This seems like it'd be relatively simple
using Redis pubsub and something like gevent, but I haven't looked
into it.

The right answer would be to keep a local copy of the database and
then update it as necessary. I believe Ian Dees has a copy of some
MongoDB code that uses quadtile to index OSM objects (I'm very fuzzy
on the details). (Update, Ian sent me this url, but I haven't taken a
look: 
https://github.com/iandees/mongosm/commit/c46c2081edde0b3b2b0446dd06d5ef02b292631c
)

Then as objects would change, you'd be able to update the tiles.

 I'd also be interested in working on data tiles -- is that in itself a 
 reasonable project idea?

I think that would be welcome. Especially if done well. My naive
approach would be slow, but if you used a different approach that
didn't keep hitting external servers on every update, it'd be a very
nifty project indeed.

 One thing I was wondering about -- how do you choose a tile size to minimize 
 both the number of accesses (larger tiles) and the byte size of tiles 
 (smaller tiles)? Some areas have a much higher density of data than others. 
 Perhaps some kind of quadtree-type approach could be used, where tiles are 
 split if they have high density?

That'd certainly work. I'd started with a naive approach of If I only
have one zoom level, things are easy, and then you just accept that
some areas are dense, and others not. At the same time, there won't be
as much demand for low density areas.

There's certainly value in cleverness and not transmitting too much
data, but there's also value in simplicity for clients.

I think with compression or binary formats like pbf, the need for
cleverness is reduced since there's overall less data transmitted.

- Serge

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Michael Daines
 I would expand 6 to be documentation for use as well as the ability to 
 replicate the server environment using OSM planet data update feeds.  I 
 personally expect the restrictions on the tile servers to be extended to the 
 API servers when enough application coders implement a way to use the API 
 directly from thousands or millions of clients at which point they'll be 
 instructed to fire up their own server and need more than just use-based 
 documentation.

Definitely. Customization would also be important, since I imagine that users 
would like to avoid storing a bunch of data they're not going to use in their 
project. (Depends on how things are implemented, though.) If your project only 
dealt with a single city, it might make things easier to set up if you knew the 
storage or computation requirements were less onerous than dealing with the 
entire world.


-- Michael


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Michael Daines
 Write some code to query jaxpi for bounding boxes in Python based on tile 
 name.
 Use this and write Data tile support in TileStache. I'd store cached
 tiles in Redis (for reasons that become apparent in a few sentences).
 I'd use the parsing/storing bits of Changepipe to tell me which tiles
 are effected by a changeset (even though I believe it uses the
 changeset's bbox, which is oftentimes wrong).
 Since Changepipe is already using Redis, using Redis for the tiles makes 
 sense.
 
 And then the issue would be how to hack in some code for the
 websocket/stream/whatever. This seems like it'd be relatively simple
 using Redis pubsub and something like gevent, but I haven't looked
 into it.

Do I have this right: the server in this implementation would act as sort of a 
fast, tile-addressed cache for data available through XAPI or similar?


 The right answer would be to keep a local copy of the database and
 then update it as necessary. I believe Ian Dees has a copy of some
 MongoDB code that uses quadtile to index OSM objects (I'm very fuzzy
 on the details). (Update, Ian sent me this url, but I haven't taken a
 look: 
 https://github.com/iandees/mongosm/commit/c46c2081edde0b3b2b0446dd06d5ef02b292631c
 )
 
 Then as objects would change, you'd be able to update the tiles.

It looks like mongosm includes an implementation of a data tile server? The 
quadtile indexing is interesting in that you use only a single parameter to 
refer to tiles, rather than the z/x/y triple commonly used with image tiles.

Keeping a local copy seems simpler and more reliable, but you have to store all 
the data... This is where I see some kind of customization as being useful -- 
if you were running your own server, and were only interested in a single city, 
or only interested in roads and building shapes, you could store just that data.


 Perhaps some kind of quadtree-type approach could be used, where tiles are 
 split if they have high density?
 
 That'd certainly work. I'd started with a naive approach of If I only
 have one zoom level, things are easy, and then you just accept that
 some areas are dense, and others not. At the same time, there won't be
 as much demand for low density areas.
 
 There's certainly value in cleverness and not transmitting too much
 data, but there's also value in simplicity for clients.

Particularly for GSoC I think I'd want to err on the side of simplicity. The 
zoom level could always be adjusted?




___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-07 Thread Josh Doe
On Tue, Mar 6, 2012 at 2:10 PM, Michael Daines mich...@mdaines.com wrote:
 When you mention changes to large relations and widely dispersed objects, I 
 was wondering if you had any specific use cases in mind? I'd also be 
 interested in hearing what kind of expressions you might expect to be able to 
 use. For example, I was thinking you could say something like give me 
 updates for things with the tag highway=residential and is_in=Canada.

I was thinking about very long routes, and country borders. If I want
to monitor changes to my state and interstate routes within the state,
I don't have any good options at the moment to do that. I don't think
it should be terribly difficult to implement, I'm just not sure how
well it will scale. To make it more interesting you could allow for
watching all changes to objects with certain tags that are within a
certain distance of a route relation, or located inside a
multipolygon.

-Josh

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-06 Thread Josh Doe
On Mon, Mar 5, 2012 at 10:58 PM, Michael Daines mich...@mdaines.com wrote:
 Hi everyone,

 I'm writing to seek opinions about a possible Google Summer of Code project. 
 I did GSoC in 2010, and I'd like to apply again this year. My project in 2010 
 was a simplified, web-based map editor.

 Since the wiki page for project ideas mentions that proposals for the 
 development of existing OSM infrastructure would be preferred, I was having a 
 look at the API v0.7 page, and noticed some interest in a monitoring feature.

 My proposal is to build a monitoring service to augment the existing API, 
 similar to the Twitter streaming API [1]. Users would request to receive map 
 updates matching tags or which involve elements in some area, and updates 
 would be sent either over a persistent connection (as Twitter does) or 
 possibly by making requests to an endpoint specified by the user. My general 
 idea for the architecture is basically to grab diffs and then send the 
 relevant parts to clients depending on what they've asked to receive.

 Clients of such a monitoring service could do things like send daily email 
 updates on map activity to users interested in a specific area or tag, 
 invalidate tiles in custom-rendered maps, or assemble a subset of available 
 OSM data for fast, up-to-date querying within that subset (a single city, for 
 example) without worrying about making lots of requests to the OSM API. That 
 third application would be useful for solving one of the problems I ran into 
 with my 2010 project -- I was optimistically loading map data with bbox 
 queries as the user panned the map, which was too slow on the production API 
 to be practical (and probably isn't what that part of the API is really meant 
 for).

 Another project idea might be to work directly on a service which would 
 provide fast querying on tag or area subsets. However, the project as I've 
 proposed it above seems to me to be sort of a generalization of that, and 
 also seems like it would require less bandwidth and disk space.

I wouldn't worry about monitoring area changes, as we have OWL[0]
(supposedly being integrated with the Rails port), Changepipe[1],
and possibly others that do this already. I'd suggest you consider
focusing on the idea of monitoring for changes based on tags and
object IDs. I've been interested in changes to some large relations,
and other widely dispersed objects, which isn't addressed by any of
the current tools. Integration with Rails would be great, so we can
Watch any object directly from the website. Of course performance
would have to be considered before implementing such a service went
live, but I don't think that's terribly important for a GSoC project.

-Josh

[0]: http://wiki.openstreetmap.org/wiki/OWL_%28OpenStreetMap_Watch_List%29
[1]: https://github.com/migurski/Changepipe

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-06 Thread Serge Wroclawski
One of the larger criticisms of GSoC is that the projects are often
abandoned after the summer.

Therefore I'd suggest that if you're going to work on something, you
work on adding a feature to an existing OSM project, rather than going
off and creating a new project.

As Josh points out, there are several similar projects out there that
monitor areas, so why not add the features you want to one of the
existing projects.

- Serge

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-06 Thread Graham Jones
Michael,
I think Serge's advice is good.  I had a go at putting down an order of
preference for how we should select GSoC Projects at the top of the ideas
pagehttp://wiki.openstreetmap.org/wiki/GSoC_Project_Ideas_2012#Types_of_Projects.
  I am proposing that we tend to favour projects that are based on
developing existing OSM related projects, rather than starting new ones.

Please add your idea to the wiki page though, and have a look at which tool
you may incorporate the idea into.

Regards


Graham.

On 6 March 2012 15:08, Serge Wroclawski emac...@gmail.com wrote:

 One of the larger criticisms of GSoC is that the projects are often
 abandoned after the summer.

 Therefore I'd suggest that if you're going to work on something, you
 work on adding a feature to an existing OSM project, rather than going
 off and creating a new project.

 As Josh points out, there are several similar projects out there that
 monitor areas, so why not add the features you want to one of the
 existing projects.

 - Serge

 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev




-- 
Graham Jones
Hartlepool, UK.
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-06 Thread Michael Daines
 I wouldn't worry about monitoring area changes, as we have OWL[0]
 (supposedly being integrated with the Rails port), Changepipe[1],
 and possibly others that do this already. I'd suggest you consider
 focusing on the idea of monitoring for changes based on tags and
 object IDs. I've been interested in changes to some large relations,
 and other widely dispersed objects, which isn't addressed by any of
 the current tools. Integration with Rails would be great, so we can
 Watch any object directly from the website. Of course performance
 would have to be considered before implementing such a service went
 live, but I don't think that's terribly important for a GSoC project.

When you mention changes to large relations and widely dispersed objects, I was 
wondering if you had any specific use cases in mind? I'd also be interested in 
hearing what kind of expressions you might expect to be able to use. For 
example, I was thinking you could say something like give me updates for 
things with the tag highway=residential and is_in=Canada.

It looks like Changepipe is similar to what I'm proposing, and OWL is sort of 
the opposite. My idea is that clients would tell the API they're interested in 
hearing about something (a tag, an id, some expression involving multiple tags, 
an area) and then updates would be sent to them as they happen, instead of 
polling for what's happened in the past. This scheme would reduce the number of 
incoming requests at the cost of the client being responsible for receiving the 
information and doing something with it. I believe this approach would reduce 
the complexity of processing new information since it would be known in advance 
what updates are required.

Another approach would be that clients tell the API they're interested in 
getting information about something and can then request an RSS feed which has 
recent updates. But instead of requesting an RSS feed for an arbitrary watch 
expression, it's this sort of bin which stuff is thrown into as the map is 
updated. This seems easier to translate into something you can do manually with 
a web browser than what's described above.

I'm curious to hear which of these approaches would be useful to people 
interested in this sort of thing. It seems like being able to ask for RSS feeds 
would be more immediately useful, but having data pushed to clients would 
allow for more flexible applications.


-- Michael
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-06 Thread Michael Daines
 First, a longstanding wishlist item for OSM has been data tiles,
 that is the API data, split into preset sized areas (eg z14), which a
 client could call. This may not seem reelvant to your project but
 you'll see why it is soon.

This was actually part of my original motivation for proposing this project -- 
in my 2010 GSoC project, I used bbox queries to load data in tile-like 
sections, but as I mentioned this turned out to be very slow. Data tiles seem 
like they could speed things up for that sort of use. Ideally, the work 
involved in accessing a data tile would be comparable to accessing an image 
tile. Also, it seems easier to cache data addressed by tile than it is to cache 
the results of arbitrary bbox queries.

I'd also be interested in working on data tiles -- is that in itself a 
reasonable project idea? My hope is that if either of these ideas are things 
people have been wanting for a while, they'll want to use them, and that if a 
project has people using it, it would be more likely to be around after the 
summer.

One thing I was wondering about -- how do you choose a tile size to minimize 
both the number of accesses (larger tiles) and the byte size of tiles (smaller 
tiles)? Some areas have a much higher density of data than others. Perhaps some 
kind of quadtree-type approach could be used, where tiles are split if they 
have high density?

The ideas you suggest for streaming-type updates on data tiles are very 
interesting. If you were writing an editor, you could be more certain that you 
were displaying the most recent data without having to reload all of it.


 While you could use Changepipe to make arbitrary polygons and then
 stream the changes, IMHO this is not as generally useful as one might
 imagine. Network hiccups alone can mean that it's possible to miss an
 event. And arbitrary polygons become complicated as the number of
 queues can be large.

I hadn't thought about using arbitrary polygons to specify areas as it seemed 
too complex -- would there be much call for that? I assume the use cases would 
be things like keeping track of updates to a city (the area of which isn't 
always conveniently specified as a bounding box).


 By splitting the areas up, you can now take a changeset and know which
 areas (tile) it effects. And then each client can simply subscribe to
 an area (tile). You've greatly simplified the problem, whether you
 allow for arbitrary shapes (one shape - many tiles) or 1:1 tiles to
 connections.
 
 Now, to your original question... Another advantage of tiling the
 data is you can easily do both. Each tile can have a list of changes
 associated with it. If you tried to do this on arbitrary polygons,
 it'd get difficult very quickly.

This makes sense, as I guess it means there are fewer bins to put things in 
when an update needs to be sent out to clients. (You only have to do the work 
once if several clients are looking at a particular tile.) And, if a client 
really did want to look at an arbitrary polygon, maybe it could rasterize the 
polygon into a list of tiles.

For people who are interested in updates to tags, a similar approach could be 
used, perhaps -- in that case I guess a tile would be analogous to a particular 
value or set of values for a tag.


-- Michael


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


[OSM-dev] Possible GSoC project: tag/area monitoring service

2012-03-05 Thread Michael Daines
Hi everyone,

I'm writing to seek opinions about a possible Google Summer of Code project. I 
did GSoC in 2010, and I'd like to apply again this year. My project in 2010 was 
a simplified, web-based map editor.

Since the wiki page for project ideas mentions that proposals for the 
development of existing OSM infrastructure would be preferred, I was having a 
look at the API v0.7 page, and noticed some interest in a monitoring feature.

My proposal is to build a monitoring service to augment the existing API, 
similar to the Twitter streaming API [1]. Users would request to receive map 
updates matching tags or which involve elements in some area, and updates would 
be sent either over a persistent connection (as Twitter does) or possibly by 
making requests to an endpoint specified by the user. My general idea for the 
architecture is basically to grab diffs and then send the relevant parts to 
clients depending on what they've asked to receive.

Clients of such a monitoring service could do things like send daily email 
updates on map activity to users interested in a specific area or tag, 
invalidate tiles in custom-rendered maps, or assemble a subset of available OSM 
data for fast, up-to-date querying within that subset (a single city, for 
example) without worrying about making lots of requests to the OSM API. That 
third application would be useful for solving one of the problems I ran into 
with my 2010 project -- I was optimistically loading map data with bbox queries 
as the user panned the map, which was too slow on the production API to be 
practical (and probably isn't what that part of the API is really meant for).

Another project idea might be to work directly on a service which would provide 
fast querying on tag or area subsets. However, the project as I've proposed it 
above seems to me to be sort of a generalization of that, and also seems like 
it would require less bandwidth and disk space.


Thanks for taking a look, and please let me know what you think!

-- Michael Daines


[1] https://dev.twitter.com/docs/streaming-api


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev