Re: [OSM-dev] New database server

2013-05-22 Thread Paweł Paprota

Andy,




And what I can tell you is that people in OSM (I mean admins here)
 are very supportive and open to changes.


I'm glad to hear that! Many people say they have a different
experience, but we try to be helpful.



Well, I don't know about other people, but I got everything I needed
(and much more than I imagined!) in terms of hardware, day-to-day
support etc.

In fact, I am a bit ashamed that given all the admin support I still
have not managed to finish what I set out to accomplish! :/

So yeah, I can say that again - for people who want to do stuff, sky is
the limit in OSM ;-)


Sadly, I simply don't have enough time to finish it to the extent
that it is acceptable (to me, not to mention others/admins) to
consider it for production use.


I think many of our software projects have similar issues, and one
of the key aspects is to try to get more than one volunteer to
develop each one.

I'm currently working on improving the documentation for the rails
port (with a view to making it easier for new developers to get
started).


That's a great initiative. Honestly, I have no idea where to best invest
time to raise the probability of attracting new contributors.

At the time when I was briefly attending EWG weekly sessions I really
thought about this a lot. What makes people decide they want to spend
their free time on a project? I reached some crazy conclusions that it
may even be that documentation is not a blocking issue - if someone
really wants to contribute, they will find their way. Where to get more
people like that? No idea.

Perhaps the project itself needs to be more sexy? I know that for some
developers (myself included) end users are one of the most important
aspects - i.e. they would rather develop for a project that is actually
used in the wild (of course I'm not saying that OSM isn't used but
certainly there are some things to be improved?).


When I finish with that, do you think that your OWL project could
benefit from something similar? Or are there other things that you
think are holding back other developers from getting started?



Ehh... OWL is a tricky topic. In the past few months there were a couple
of people who seemed interested in contributing but I think they got
stuck at the setup stage and/or understanding the code.

I really need to try and simplify some of the implementation, document
it better etc. Perhaps I should consider narrowing the scope for OWL,
right now I think I may be trying to tackle too many issues at once -
and that's why the amount of code and its complexity is growing and also
why I am not able to deliver anything.

Currently I am thinking about making OWL a bit leaner and perhaps
separating some of the stuff (like vector tiles and integration with
client-side rendering) to other (future) projects.

I think in OSM we will (slowly but surely!) get to the tipping point and
more contributors will come.

Paweł


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Florian Lohoff
On Mon, May 20, 2013 at 09:32:30PM -0400, Jason Remillard wrote:
 Hi,
 
 schema). This would allow the site to scale more incrementally, and
 potentially scale to larger loads than putting all of our eggs into
 two monster servers. For the money we planning on spending on the big
 server, we could get could get several of these smaller edge servers
 with flash disks and a less expensive redundant write/history database
 server. As we need to scale, we can do it in 3,000 dollar increments
 rather 40,000 dollars increments. Having a server with 29 disks, does
 not seem like a good situation. Just a thought.

Without beeing involved in the server issues my guess is that not the
amount of API calls but the working set is the problem here.

We are talking about a multi terabyte working set. Grabbing data out of
this working set is a very tedious task. You might want to grab 10 bytes
in the front - 20 in the middle and 60 at the end. For this you need to
walk through multiple gigabytes of indexes and move your disk heads like
1750 times.

Not looking at SSDs the number of spindles is the concurrency you can
get from this. The more heads - the more concurrent accesses.

SSDs will help accelerate indexes but today are not a solution for the
full database comparing € or $.

Flo
-- 
Florian Lohoff f...@zz.de


signature.asc
Description: Digital signature
___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Andy Allan
On 21 May 2013 02:32, Jason Remillard remillard.ja...@gmail.com wrote:

 The server that we are planning on purchasing is monster. Very
 complicated and expensive. I am concerned that this might not be the
 best way to go.

Indeed, it might not be the best way to go, and any thoughts and
brainpower applied to the problem are always very welcome.

Of course, at OWG we *do* think it's the best approach, given all the
trade-offs involved, and it's a decision that hasn't been undertaken
lightly.

 We have a google summer of code proposal to write an edge proxy server
 for the OSM API. I don't know if the project will be accepted, but it

It's worth bearing in mind that we don't have any places on this
year's GSoC, and even if OSGeo decided to go for this project on our
behalf, it could be months before we even have any idea whether or not
the implementation is feasible. I don't mean to be negative, but
weighing up the here and now requirements against a hypothetical
alternative at some point in the future is one of these trade-offs
that we routinely have to make at OWG.

 For the money we planning on spending on the big
 server, we could get could get several of these smaller edge servers
 with flash disks and a less expensive redundant write/history database
 server.

Well, we could certainly spend the money on small edge servers, but
it's not clear to me why you think that would make the central server
less expensive. I think this proposal may be worthwhile but it's
somewhat orthogonal to the goals of the new server.

At the moment we have two osm-database-class machines, the older of
which (smaug) is no longer capable of handling the full load on its
own, but is still useful as a database-level read slave. The newer
machine (ramoth) can handle the load entirely on its own, but is
approaching the limits of dealing with the full read+write load.

When it comes to the master database, we need certain characteristics:
A) To be able to handle the write load (and the associated reads
involved in checking constraints etc)
B) To be able to store the entire database
C) To be more than one of these machines, for failover

Smaug most likely doesn't fulfil A, and so currently we don't really
fulfil C. So we need a new machine that can do A+B, and these are
unfortunately expensive. In order to last more than 6 months, the new
machine also needs plenty of space (B) on fast disks (A) which is
where most of the money goes.

Having map-call-proxies, as you discuss, doesn't solve any of A, B or
C for the master database. Sharing out the read-only load is a good
idea, but it's not clear to me whether it is better done with
postgres-level database replication (as we have been doing),
proxy-level replication (as per this GSoC idea), or even just
examining logs and ban-hammering people scraping the map call (my
personal favourite!).

 As we need to scale

It's best in these conversations to be precise about what we mean by
to scale. Scaling read-requests is only one aspect, and we have a
variety of feasible (and active) options. Long-term, we may[1] need to
work around the need for all these machines to store the entire
database (B), and that's Hard. We may[2] also need to figure out how
to solve A, and that's Hard too.

Like I said at the start, thoughts and brainpower are always welcome!

Cheers,
Andy

[1] If we grow osm faster than Moore's law, otherwise: happy days
[2] If db-write activity outpaces disk-io and/or network bandwidth
increases, otherwise: happy days

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Serge Wroclawski
Jason,

You're not at all wrong about the issues with the server design.

This is something that's been well known and understood for several years:

   As the project grows, the cost of scaling on a single system will
not scale accordingly.

What I mean by that is, that it's not a linerar cost to buy a single
machine with linear scaling.

So if you are growing, it makes more economical (and technical) sense
to scale away, rather than building up.

What would this mean in the context of OSM?

It might mean something like moving the GPX data off the main
database. Or maybe having historical data on a slower database than
the current data.

It also includes things like aggressive caching and uausing tiled map
calls (something that Ian and I worked on, and Ian has a new
implementation of).

And there's room for more optimizations even then, but just these
would make an impact.

So why doesn't this happen? Frankly, because I think the project
doesn't have anyone who can act in the kind of technical leadership
role this would require.

Making these kinds of changes would require modifying (and testing)
the rails port, as well as possibly modifying cgimap (depending on
which calls were effected), and the database, and setting up the new
hardware, and coordinating with whatever hosting situation that would
be in, etc.

It's not something anyone can do, with the possible exception of the
sys-admins (who are both extremely overworked and volunteer).

This is why the org needs a structural change, to give someone the
authority and resources to oversee projects like this.

Without this, the OWG is stuck ordering more hardware.

- Serge

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Richard Fairhurst
Serge Wroclawski wrote:
 So why doesn't this happen? Frankly, because I think the 
 project doesn't have anyone who can act in the kind of 
 technical leadership role this would require.

Define can.

The project has plenty of people capable of doing this.

But IME the main barrier between capable of doing and actually doing is
the amount of shit you have to suffer from armchair experts whenever you try
to actually do anything in OSM.

Richard





--
View this message in context: 
http://gis.19327.n5.nabble.com/New-database-server-tp5761947p5762058.html
Sent from the Developer Discussion mailing list archive at Nabble.com.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Frederik Ramm

Hi,

On 05/21/13 16:08, Serge Wroclawski wrote:

So why doesn't this happen? Frankly, because I think the project
doesn't have anyone who can act in the kind of technical leadership
role this would require.


Knee-jerk call for authority? Never worked well.


It's not something anyone can do, with the possible exception of the
sys-admins (who are both extremely overworked and volunteer).


It is not something that someone can do *on their own*, and I think that 
is not so bad. To make a change in OSM you need to work together with 
others, make an argument for your vision over a longer time, get buy-in 
from a larger group of people, and eventually things will move. I 
wouldn't want to sacrifice that careful, evolutionary process in 
exchange for a visionary leader where we all just do what he says.



This is why the org needs a structural change, to give someone the
authority and resources to oversee projects like this.


I don't think we want or need authority figures who are somehow exempt 
from having to *convince* us that their idea is good.


Bye
Frederik

--
Frederik Ramm  ##  eMail frede...@remote.org  ##  N49°00'09 E008°23'33

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Simon Poole

I normally stay out of the tech bike-shedding discussions, however I do
want to point out 

- we are aeons away from requiring and running cutting/bleeding edge
hardware (and having to pay for such)

- in the grand scheme of things we are not spending a lot of money on
hardware (on the one hand our sys admins and the OWG are very frugal and
on the other see the first point)

- the amount of money we spend is a lot of money for the foundation, at
least relative to our other spending, however it is extremely unlikely
that we could away with spending less regardless of implementation
(distributed, 3rd party cloud etc etc etc).

- our current setup is fairly straightforward, fancier schemes are very
likely to be more error prone with the associated costs (manpower)

All this said, I would recommend that anybody who actually wants to help
should participate in the OWG and help with the other tech tasks that we
have in abundance.

Simon


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Ian Dees
Can someone add more information about when and how the OWG meets? The wiki
page is pretty bare on how one might help:

http://www.osmfoundation.org/wiki/Operations_Working_Group


On Tue, May 21, 2013 at 10:07 AM, Simon Poole si...@poole.ch wrote:


 I normally stay out of the tech bike-shedding discussions, however I do
 want to point out

 - we are aeons away from requiring and running cutting/bleeding edge
 hardware (and having to pay for such)

 - in the grand scheme of things we are not spending a lot of money on
 hardware (on the one hand our sys admins and the OWG are very frugal and
 on the other see the first point)

 - the amount of money we spend is a lot of money for the foundation, at
 least relative to our other spending, however it is extremely unlikely
 that we could away with spending less regardless of implementation
 (distributed, 3rd party cloud etc etc etc).

 - our current setup is fairly straightforward, fancier schemes are very
 likely to be more error prone with the associated costs (manpower)

 All this said, I would recommend that anybody who actually wants to help
 should participate in the OWG and help with the other tech tasks that we
 have in abundance.

 Simon


 ___
 dev mailing list
 dev@openstreetmap.org
 http://lists.openstreetmap.org/listinfo/dev

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Paweł Paprota

On 05/21/2013 04:08 PM, Serge Wroclawski wrote:

This is why the org needs a structural change, to give someone the
authority and resources to oversee projects like this.

Without this, the OWG is stuck ordering more hardware.


Reluctant +1 from me though I am sure this is not a popular view (we 
don't want authority and all that)... I can only say from my own 
experience what I was/am trying to do with OWL/history tab/whatever you 
call it - in itself not a trivial task but on the other hand the 
challenges with the overall architecture (both logical and physical) are 
at least at the same level (if not above!) of complexity.


And what I can tell you is that people in OSM (I mean admins here) are 
very supportive and open to changes. I love this in this project that I 
did something and I had/have the opportunity to get it into production. 
Sadly, I simply don't have enough time to finish it to the extent that 
it is acceptable (to me, not to mention others/admins) to consider it 
for production use.


So I think your statement is 100% true but I think the main problem is 
that there are no people or/and no money to do this. Think if you were 
the person that would need/want to do this work. It most definitely *is* 
a full time job for at least a few people - service like OSM doesn't run 
on rainbows and good wishes. And *changing* how it runs, i.e. changing 
the fundamental architecture or introducing some kind of load balancing 
or whatever is an *enormous* task.


In my opinion it is clear that this kind of work will not be done by 
volunteers - it is just too much interconnected stuff that needs to be 
handled properly, not to mention operations around all of it.


How to handle that - I have no idea... other than paying people - that 
would be the obvious (and IMHO surefire) solution. Of course the other 
question is whether there even are people (among current admins 
ideally?) who are willing to sacrifice (part of) their professional 
careers to work for some time on OSM - not sure. But I think at this 
stage and level of complexity as OSM has right now there is only one 
solution - as you said - structural change. I would put it more bluntly 
- right people + money for their time is the only way forward.


Paweł


___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Andy Allan
On 21 May 2013 18:50, Paweł Paprota ppa...@fastmail.fm wrote:

 And what I can tell you is that people in OSM (I mean admins here) are very
 supportive and open to changes.

I'm glad to hear that! Many people say they have a different
experience, but we try to be helpful.

 Sadly, I
 simply don't have enough time to finish it to the extent that it is
 acceptable (to me, not to mention others/admins) to consider it for
 production use.

I think many of our software projects have similar issues, and one of
the key aspects is to try to get more than one volunteer to develop
each one.

I'm currently working on improving the documentation for the rails
port (with a view to making it easier for new developers to get
started). When I finish with that, do you think that your OWL project
could benefit from something similar? Or are there other things that
you think are holding back other developers from getting started?

Cheers,
Andy

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev


Re: [OSM-dev] New database server

2013-05-21 Thread Jason Remillard
Hi Andy,

Thank you for the detailed reply.

I have no issues with how the money is being handled, nor spending
this kind of money on hardware.

OSM growth is pretty amazing right now. Over the lifetime of this
server we should be planning for at least 15x/20x more traffic. If
this server can handle 15x/20x more traffic than the current site
generates, then we are good, case closed. If we can't get to 15x/20x
traffic levels without large architectural changes, we should start
steering ourselves into a new architecture now. Standard proxy servers
will not work well with our API, it will be painful if we are not
proactive about it. It seems like some kind of geographically
distributed cached edge server will eventually be needed. The OWG
obviously knows all of this.

Really the bottom line of my email is as follows:

The Google Summer Of Code project (if it moves forward) will be not
that hard to integrate into the current server infrastructure and it
has a reasonable chance of being stood up by this fall. I just wanted
to insure that the OWG was aware of the project and considered it in
the resource planning for our new servers. Concretely, this means we
have the option of configuring the new database server for only
history and write requests rather than all of the API.

Thanks
Jason.

On Tue, May 21, 2013 at 5:49 AM, Andy Allan gravityst...@gmail.com wrote:
 On 21 May 2013 02:32, Jason Remillard remillard.ja...@gmail.com wrote:

 The server that we are planning on purchasing is monster. Very
 complicated and expensive. I am concerned that this might not be the
 best way to go.

 Indeed, it might not be the best way to go, and any thoughts and
 brainpower applied to the problem are always very welcome.

 Of course, at OWG we *do* think it's the best approach, given all the
 trade-offs involved, and it's a decision that hasn't been undertaken
 lightly.

 We have a google summer of code proposal to write an edge proxy server
 for the OSM API. I don't know if the project will be accepted, but it

 It's worth bearing in mind that we don't have any places on this
 year's GSoC, and even if OSGeo decided to go for this project on our
 behalf, it could be months before we even have any idea whether or not
 the implementation is feasible. I don't mean to be negative, but
 weighing up the here and now requirements against a hypothetical
 alternative at some point in the future is one of these trade-offs
 that we routinely have to make at OWG.

 For the money we planning on spending on the big
 server, we could get could get several of these smaller edge servers
 with flash disks and a less expensive redundant write/history database
 server.

 Well, we could certainly spend the money on small edge servers, but
 it's not clear to me why you think that would make the central server
 less expensive. I think this proposal may be worthwhile but it's
 somewhat orthogonal to the goals of the new server.

 At the moment we have two osm-database-class machines, the older of
 which (smaug) is no longer capable of handling the full load on its
 own, but is still useful as a database-level read slave. The newer
 machine (ramoth) can handle the load entirely on its own, but is
 approaching the limits of dealing with the full read+write load.

 When it comes to the master database, we need certain characteristics:
 A) To be able to handle the write load (and the associated reads
 involved in checking constraints etc)
 B) To be able to store the entire database
 C) To be more than one of these machines, for failover

 Smaug most likely doesn't fulfil A, and so currently we don't really
 fulfil C. So we need a new machine that can do A+B, and these are
 unfortunately expensive. In order to last more than 6 months, the new
 machine also needs plenty of space (B) on fast disks (A) which is
 where most of the money goes.

 Having map-call-proxies, as you discuss, doesn't solve any of A, B or
 C for the master database. Sharing out the read-only load is a good
 idea, but it's not clear to me whether it is better done with
 postgres-level database replication (as we have been doing),
 proxy-level replication (as per this GSoC idea), or even just
 examining logs and ban-hammering people scraping the map call (my
 personal favourite!).

 As we need to scale

 It's best in these conversations to be precise about what we mean by
 to scale. Scaling read-requests is only one aspect, and we have a
 variety of feasible (and active) options. Long-term, we may[1] need to
 work around the need for all these machines to store the entire
 database (B), and that's Hard. We may[2] also need to figure out how
 to solve A, and that's Hard too.

 Like I said at the start, thoughts and brainpower are always welcome!

 Cheers,
 Andy

 [1] If we grow osm faster than Moore's law, otherwise: happy days
 [2] If db-write activity outpaces disk-io and/or network bandwidth
 increases, otherwise: happy days

___
dev 

[OSM-dev] New database server

2013-05-20 Thread Jason Remillard
Hi,

I could not find a discussion on the new database server.

http://wiki.openstreetmap.org/wiki/New_server_and_fund_raising_drive_2013

The server that we are planning on purchasing is monster. Very
complicated and expensive. I am concerned that this might not be the
best way to go.

We have a google summer of code proposal to write an edge proxy server
for the OSM API. I don't know if the project will be accepted, but it
has got me thinking about the approach and our funding drive. The idea
is that each front facing server has a local snapshot copy of the OSM
database to service all of the read only calls. These edge servers
could be geographically distributed. It would just leave the central
database server to deal with write requests, history requests, and
diffs ( anything that can't be handled with a snapshot database
schema). This would allow the site to scale more incrementally, and
potentially scale to larger loads than putting all of our eggs into
two monster servers. For the money we planning on spending on the big
server, we could get could get several of these smaller edge servers
with flash disks and a less expensive redundant write/history database
server. As we need to scale, we can do it in 3,000 dollar increments
rather 40,000 dollars increments. Having a server with 29 disks, does
not seem like a good situation. Just a thought.

Thanks
Jason.

___
dev mailing list
dev@openstreetmap.org
http://lists.openstreetmap.org/listinfo/dev