Re: [OSM-dev] OSM StreetDensityMap

2012-03-06 Thread npl

 Cool. I've been wanting to give hadoop / map/reduce a try with OSM data
 but the wiki does not offer much. It would be nice if someone with some
 experience would create a wiki page. I'm sure it would be interesting
 for the community as well as GIScience folks to have a place to start.

If I've some time I'll create a wiki page for osm on hadoop (and post 
it here).


 It gives also a sort-of osm activity map.

 Well, it does and it doesn't. You'd have to compare it to a reference
 road network density map to appreciate the activity of the OSM community
 in representing reality in OSM.

That's right.

 I see a lot of potential for this beyond 'simple' visualisation. Systems
 like TagInfo and OWL could benefit, maybe? Does your framework lend
 itself for (near) real time processing of OSM data, or does it only work
 with snapshot data?

MapReduce itself is a programming model. It allows you to process data 
by defining map- and reduce-functions (and is thus quite easy to learn).


It's implemented as a distributed batch processing framework and allows 
you to process TBs of data on a cluster of up to hundreds of nodes. The 
real benefit of using such a system is that it scales linearly (well, 
you could say between O(n) and O(nlogn)) and single systems (like 
relational DBs) can't scale that high.


Our cluster was around 10 nodes, and it took us about 3-4 hours to 
create the map and store it on HBase (although the cluster was not busy 
the whole time) [where the uncompressed planet-file is about 200GB].


That said, you can run small jobs on hadoop/mapreduce in a few minutes 
(= near realtime) and it would be be possible to

  * process the planet-file once and store the results in a DB
  * process the planet-file diffs (e.g. hourly) and update the DB

TagInfo-like systems (aggregating big data and creating statistics) 
could definitely be built using hadoop/mapreduce.



- npl

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM StreetDensityMap

2012-03-06 Thread Jochen Topf
On Tue, Mar 06, 2012 at 01:31:31PM +0100, npl wrote:
 TagInfo-like systems (aggregating big data and creating statistics)
 could definitely be built using hadoop/mapreduce.

Or you can do what Taginfo does and just write it cleverly, so it just uses one
host for an hour instead of 10 hosts for several hours. :-) I do agree that
there are many use-cases for Hadoop  Co. But they also create a lot of
overhead...

Reminds me a bit of the Osmarender/Tiles@Home story: First we write a renderer
thats horribly slow and inefficient. So we have to distribute the work load
which makes it even more inefficient. Then to keep it going we invent more
and more technology around it. Oh well, I liked Osmarender, spent quite a
lot of time improving it and rendering maps with it. Sometimes its not about
being efficient. :-)

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM StreetDensityMap

2012-03-06 Thread Martijn van Exel
Hi,

On Tue, Mar 6, 2012 at 7:42 AM, Jochen Topf joc...@remote.org wrote:

 On Tue, Mar 06, 2012 at 01:31:31PM +0100, npl wrote:
  TagInfo-like systems (aggregating big data and creating statistics)
  could definitely be built using hadoop/mapreduce.

 Or you can do what Taginfo does and just write it cleverly, so it just
 uses one
 host for an hour instead of 10 hosts for several hours. :-) I do agree that
 there are many use-cases for Hadoop  Co. But they also create a lot of
 overhead...


Just one host for an hour a day? Wow. That's not very much processing time
at all for what it provides. Awesome.

-- 
martijn van exel
geospatial omnivore
1109 1st ave #2
salt lake city, ut 84103
801-550-5815
http://oegeo.wordpress.com
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM StreetDensityMap

2012-03-06 Thread Jochen Topf
On Tue, Mar 06, 2012 at 10:44:45AM -0600, Martijn van Exel wrote:
 On Tue, Mar 6, 2012 at 7:42 AM, Jochen Topf joc...@remote.org wrote:
 
  On Tue, Mar 06, 2012 at 01:31:31PM +0100, npl wrote:
   TagInfo-like systems (aggregating big data and creating statistics)
   could definitely be built using hadoop/mapreduce.
 
  Or you can do what Taginfo does and just write it cleverly, so it just
  uses one
  host for an hour instead of 10 hosts for several hours. :-) I do agree that
  there are many use-cases for Hadoop  Co. But they also create a lot of
  overhead...
 
 
 Just one host for an hour a day? Wow. That's not very much processing time
 at all for what it provides. Awesome.

I just had a look and currently its at about 2h for the main statistics
generation. So thats gone up from the 1h it used to have. Thats because
people keep asking for more features. :-)

It takes about another hour for crawling the wiki etc.

Most days it takes about another 1.5h for updating the planet files, but on
some days thats considerably slower. I should probably try to figure out why
thats the case. Maybe something else runs in parallel on the machine.

All of that on a 800MHz machine using about 6GB RAM. The OSM processing is
mostly CPU bound so on a modern machine it would be faster. One relatively
easy optimization would be to run the planet update and statistics gathering
in one step. But for now I am lazy and let Osmosis do the planet update
first.

Jochen
-- 
Jochen Topf  joc...@remote.org  http://www.remote.org/jochen/  +49-721-388298


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM StreetDensityMap

2012-03-05 Thread Martijn van Exel
Hi,

On Mon, Mar 5, 2012 at 12:14 PM, npl n...@gmx.de wrote:

 Hi

 Last semester, a few other guys and I were involved in a project, where we
 wanted to get familiar with hadoop. Since OSM has big data, we decided to
 do some hadoop processing on the osm planet-file. We ended up creating a
 StreetDensityMap of the world, and extended JMapViewer for the graphical
 output. (screenshot of europe: https://raw.github.com/npl/**
 dda/master/screenshots/osm_**density_europe.jpghttps://raw.github.com/npl/dda/master/screenshots/osm_density_europe.jpg
 )

 The project is hosted at github: https://github.com/npl/dda


Cool. I've been wanting to give hadoop / map/reduce a try with OSM data but
the wiki does not offer much. It would be nice if someone with some
experience would create a wiki page. I'm sure it would be interesting for
the community as well as GIScience folks to have a place to start.


 It gives also a sort-of osm activity map.


Well, it does and it doesn't. You'd have to compare it to a reference road
network density map to appreciate the activity of the OSM community in
representing reality in OSM.

I see a lot of potential for this beyond 'simple' visualisation. Systems
like TagInfo and OWL could benefit, maybe? Does your framework lend itself
for (near) real time processing of OSM data, or does it only work with
snapshot data?


 If you want to try it out, you will need your own hadoop cluster (well, a
 few nodes a few hours long is enough) -- there is no public server
 available. If you've any questions, don't hesitate to ask me!

 - npl

 __**_
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.**org/listinfo/devhttp://lists.openstreetmap.org/listinfo/dev




-- 
martijn van exel
geospatial omnivore
1109 1st ave #2
salt lake city, ut 84103
801-550-5815
http://oegeo.wordpress.com
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev