Thanks for the response; this looks quite helpful. I will write and run some
code this weekend and see how far I can get, with the goals of quantifying
performance and identifying further improvements that are needed for my
project. I will contact you off-list about a skype call.
--
David Winslow
OpenGeo - http://opengeo.org
On Feb 17, 2011 4:54 AM, "Peter Neubauer" <[email protected]>
wrote:
> David,
> as Craig mentioned, we are in the process of profiling insertion into
> Neo4j Spatial. From the volume side of things, Germany.osm consists of
> roughly 60M points and 8M ways (a common pattern, even other sets have
> roughly 10% of the nodes in way count).
>
> From the file sizes, planet.osm is 10 times bigger than Germany
> (Germans are CRAZY mappers :), so in theory it should fit into one
> Neo4j instance.
>
> Regarding the File size of the database, there are two issues here:
> The OSM datasets contain a lot of Strings, which let the neo4j string
> store file size grow quite heavily. On this issue, we are working
> right now to get ShortString support into Neo4j, which in OSM should
> reduce the String store sizes with up to 80% on disc. The other thing
> are the indexes. I am playing around with exact indexes (like
> BerkeleyDB as a K/V store) instead of Lucene, to match the indexing
> speed to Neo4j itself. I will report back on that theme, since IMHO
> the insertion speed for germany.osm is not good enough yet.
>
> So, for the moment, I would recommend that you look at the features of
> Neo4j Spatial, and report back or fork anything that is missing, so we
> can get to a feature-fit, while we slowly start to go into
> optimization mode.
>
> WDYT?
>
> Cheers,
>
> /peter neubauer
>
> GTalk: neubauer.peter
> Skype peter.neubauer
> Phone +46 704 106975
> LinkedIn http://www.linkedin.com/in/neubauer
> Twitter http://twitter.com/peterneubauer
>
> http://www.neo4j.org - Your high performance graph database.
> http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
>
>
>
> On Thu, Feb 17, 2011 at 9:30 AM, Craig Taverner <[email protected]> wrote:
>> Hi David,
>>
>> Great to hear you're interested in using neo4j for the OSM model. I also
>> think it is a great match. However, you are right to assume there are a
few
>> missing pieces. Two shortcomings that are relevant to your questions
below
>> are:
>>
>> - *Scalability*. We only recently tried to load very large OSM files,
>> starting with Germany. There are two problems with this currently, one
is
>> that the graph model takes up too much disk space, and the second is
the
>> load performance degrades. The problem here is that the OSM file,
despite
>> being XML, is to some extent just a sequential dump of a number of
postgis
>> tables, the first being the point nodes, and only later the ways with
>> foreign key references to the nodes. So we need an independent way to
lookup
>> the node-id's when loading the ways, and currently use a lucene index
for
>> this. The index works, but like all tree structures degrades in
performance
>> as the total index size increases. Peter has been investigating this,
and is
>> in the process of evaluating two options:
>> - Switching off the batch-inserter. I refactored the OSMImporter to
>> allow for importing with the normal GraphDatabaseService instead of
the
>> batch inserter, and Peter is trying this out for the performance
>> of larger
>> loads and incremental loads.
>> - Using an index other than lucene. Peter is currently evaluating
the
>> BDB database for its exact match index which might perform better
than
>> lucene for the node-id lookup.
>> - *Changesets*. We do not yet properly support changesets. In fact, the
>> current code loads the OSM XML into a structure that still has some
residual
>> resemblances to the XML, for example we store the user, uid and
changeset as
>> properties of the nodes and ways they were attributes of in the XML. I
have
>> started refactoring this, and the plan is to make a two phase
improvement:
>> - Firstly structure users and changesets as a tree, with nodes and
>> ways related to the changeset in the tree structure. This allows for
>> analysis of the graph from the perspective of users and
>> changesets. It also
>> reduces the total disk-space used because the user, uid and
>> changeset id are
>> not duplicated in properties as they are today. I have already
>> done part of
>> this work on my computer, but not pushed it. I see database size
>> reductions
>> down to nearly 60% of previous, but I have not completed the new
tree, so
>> the size will go up again somewhat.
>> - Secondly, once we have the changeset tree in place we can work on
>> applying changes to the graph. As you requested in your email,
>> we want to be
>> able to apply the daily updates to an existing full OSM model.
>>
>> So, we have definitely thought about your specific requirements, but due
to
>> other priorities have not made much progress in completing these. I
>> certainly welcome your feedback, and even help, in completing this work.
I
>> suggest we take a skype call to discuss this further.
>>
>> Regards, Craig
>>
>> On Thu, Feb 17, 2011 at 4:28 AM, David Winslow <[email protected]>
wrote:
>>
>>> Hi all,
>>>
>>> My organization (OpenGeo) is investigating options for generating and
>>> hosting map tiles based on OpenStreetMap data on Amazon AWS. We are
>>> currently using OSM's osm2pgsql tool with a PostGIS database, GeoServer
>>> with
>>> SLD styles to render the data, and GeoWebCache to dice up the map into
>>> tiles
>>> and serve them from a filesystem cache. I'm interested in investigating
>>> neo4j-spatial as an alternative to Postgres since the graph model seems
to
>>> fit OSM's data more cleanly than an RDBMS. To be clear, investigating
>>> neo4j
>>> is just a side project for me at present. I've played with
neo4j-spatial
>>> before, and I plan on getting my hands a bit dirty this weekend, but for
>>> now
>>> I have a few questions about it.
>>>
>>> 1) Has anyone attempted a full OSM planet import using neo4j-spatial?
Any
>>> tips on ensuring it goes smoothly (how much disk it is likely to
require,
>>> whether the full planet dump will fit in a neo4j 1.2 database, etc)?
>>> 2) Is there any information available about neo4j performance on EC2?
>>> 3) The rendering process divides up the OSM data into several classes
which
>>> are styled differently (roads/rivers/buildings/etc). I am aware that
>>> neo4j-spatial can index sublayers based on property filters, but when I
>>> last
>>> checked the filter syntax used wasn't as flexible as I need for the
>>> stylesheet I'm using. For my investigation this weekend I am thinking
of
>>> replacing the existing filter system with one based on CQL[1] to
serialize
>>> filters, does that seem like a bad idea?
>>> 4) Is there any support for applying OSM's daily or minutely patches?
>>> (From
>>> a look at the code, I think the answer is no, so if not - how tough
would
>>> it
>>> be to add? Are there any design docs or notes written up about
implementing
>>> that feature?)
>>>
>>> [1] CQL - http://docs.codehaus.org/display/GEOTOOLS/ECQL+Parser+Design
>>>
>>> Thanks in advance.
>>>
>>> --
>>> David Winslow
>>> OpenGeo - http://opengeo.org/
>>> _______________________________________________
>>> Neo4j mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>> _______________________________________________
>> Neo4j mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo4j mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
_______________________________________________
Neo4j mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user