Playing devil's advocate for a minute, honestly I'm not sure I would be
as content with the current OSM data storage and processing
architecture. Main central server with a single multi-TB-sized database
somehow screams single point of failure to me... Add to that the growth
rate (which is pretty crazy when you look at it closely) and I'd be
worried whether such setup is future-proof enough to justify brushing
off Hadoop topics with you just don't know how to use Postgres/PostGIS
properly.
I love Postgres as much as the next guy, hell, I'm actively trying to
get my code working with FULL HISTORY database and Postgres does have a
lot of features that make this easier/possible. But at some point you
need to look at the big picture and where will the infrastructure be in
1, 2 and 5 years time?
Yeah I know it's just talk and no solutions but for now I don't have any
to this particular problem :P
Paweł
On Mon, Jan 12, 2015, at 20:50, Frederik Ramm wrote:
Stephen,
previous discussions of combining NoSQL *or* massively parallel
storage with OSM were often less driven by the approach let's
investigate solid future storage models for OSM but rather by hey
there's a cool new technology I'd like to play with and I'm sure it can
somehow work with OSM.
The results were often, if there were any at all, along the lines of
hey this particular very specific use case is now 2% faster than
before, but looking closer you'd see that the same speedup could have
been had with an old-fashioned un-sexy create index statement if the
author had known anything about PostgreSQL/PostGIS (*), or maybe that
the data import took five weeks unless you had massive hardware, or
somesuch.
I was therefore a bit skeptical reading your message, but relieved when
I found that you're keeping an open mind about the results and plan to
thoroughly analyse whether using a massively parallel storage will
indeed perform better than plain old PostgreSQL/PostGIS for what are
OSM's everyday use cases.
(I'd like to see the word cost-effective thrown in somewhere - and for
data reading we have a sufficiently well scaling data replication in
place already. As far as the central database is concerned, OSM is very
interested in making it easy for everyone to run their own local copy.)
Bye
Frederik
(*) It is an often overlooked fact that the amount of actual geo
information in the central database is small - just the node coordinates
- everything else is plain old relational stuff. Therefore the OSM
database doesn't even need or use the PostGIS spatial extensions - but
they are often used for analysing OSM data after importing them in a
separate database.
--
Frederik Ramm ## eMail frede...@remote.org ## N49°00'09 E008°23'33
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev
___
dev mailing list
dev@openstreetmap.org
https://lists.openstreetmap.org/listinfo/dev