Re: [OSM-dev] OSM with Hadoop

2015-01-14 Thread Paweł Paprota
Playing devil's advocate for a minute, honestly I'm not sure I would be
as content with the current OSM data storage and processing
architecture. Main central server with a single multi-TB-sized database
somehow screams single point of failure to me... Add to that the growth
rate (which is pretty crazy when you look at it closely) and I'd be
worried whether such setup is future-proof enough to justify brushing
off Hadoop topics with you just don't know how to use Postgres/PostGIS
properly.

I love Postgres as much as the next guy, hell, I'm actively trying to
get my code working with FULL HISTORY database and Postgres does have a
lot of features that make this easier/possible. But at some point you
need to look at the big picture and where will the infrastructure be in
1, 2 and 5 years time?

Yeah I know it's just talk and no solutions but for now I don't have any
to this particular problem :P

Paweł

On Mon, Jan 12, 2015, at 20:50, Frederik Ramm wrote:
 Stephen,
 
previous discussions of combining NoSQL *or* massively parallel
 storage with OSM were often less driven by the approach let's
 investigate solid future storage models for OSM but rather by hey
 there's a cool new technology I'd like to play with and I'm sure it can
 somehow work with OSM.
 
 The results were often, if there were any at all, along the lines of
 hey this particular very specific use case is now 2% faster than
 before, but looking closer you'd see that the same speedup could have
 been had with an old-fashioned un-sexy create index statement if the
 author had known anything about PostgreSQL/PostGIS (*), or maybe that
 the data import took five weeks unless you had massive hardware, or
 somesuch.
 
 I was therefore a bit skeptical reading your message, but relieved when
 I found that you're keeping an open mind about the results and plan to
 thoroughly analyse whether using a massively parallel storage will
 indeed perform better than plain old PostgreSQL/PostGIS for what are
 OSM's everyday use cases.
 
 (I'd like to see the word cost-effective thrown in somewhere - and for
 data reading we have a sufficiently well scaling data replication in
 place already. As far as the central database is concerned, OSM is very
 interested in making it easy for everyone to run their own local copy.)
 
 Bye
 Frederik
 
 (*) It is an often overlooked fact that the amount of actual geo
 information in the central database is small - just the node coordinates
 - everything else is plain old relational stuff. Therefore the OSM
 database doesn't even need or use the PostGIS spatial extensions - but
 they are often used for analysing OSM data after importing them in a
 separate database.
 
 -- 
 Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33
 
 ___
 dev mailing list
 dev@openstreetmap.org
 https://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] OSM with Hadoop

2015-01-14 Thread Frederik Ramm
Hi,

On 01/14/2015 11:16 AM, Paweł Paprota wrote:
 Main central server with a single multi-TB-sized database
 somehow screams single point of failure to me...

... until you learn that there's actually an active replication, based
on technology that has been tried and proven.

I'm not saying brush away parallel storage, just don't assume from the
outset that it will magically solve problems without incurring others.
Also, we're not the only people operating large relational databases,
and development on that front is also occurring, with the potential of
an efficient multi-master replication system somewhere down the line
leading to a different kind of also-parallel infrastructure. Where the
failure points of each of these architectures lie and how
cost-effective, maintenance intensive, or error prone they are, is
indeed a good subject for analysis.

All I'm saying that it's worth to apply a scientific approach (i.e.
looking at facts) rather than a geek approach (i.e. looking at cool new
technology and shiny lights). And from Stephen's paper I had the
impression that we wouldn't easily be blinded by those shiny lights
which is a good thing.

(In our particular case, the amount of editing that occurs rises far
slower than the amount of data that we have collected, which means that
it is not unlikely that we will be able to work with a centralised
writing, distributed reading approach for quite a while still.)

Bye
Frederik

-- 
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev