Re: [Neo4j] Are graphs ok for lots of Event data
Of course the graph can be used for processing event data, and whether that works for your case or not depends. But we have used it for this, and I can discuss a few points. The event stream is obviously just a linear chain and can be modeled as such in the graph (eg. with NEXT relationships between event nodes). However this does not bring much advantage over the original flat file which already has implicit next (next line, assuming time ordered). You could instead use a TimeLineIndex to manage the order, and then you would have an advantage over disordered original data. Durations between events can be new nodes with START and END relationships to the individual events, and the time difference optionally added as a property to the duration node. One nice thing about the graph is that you can keep adding data and structure as you go, sometimes much later. So your question about adding server and number of items processed, etc, can be added later, at your convenience. When grouping events together and getting statistics, some things can be added incrementally, like max/min/count/total. But percentile is not so trivial. Consider the case where you want to know the statistics for each hour of events. If you have an hour node connected to all event nodes in that hour, you can update the max/min/count/total values as new event data enters the database. But percentile needs to be calculated once all events in the hour have arrived. This can be handled at the application level. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] path finding using OSM ways
We do indeed have twice the node count (and twice the relationship count). This is a necessary side effect of the fact that an OSM node can participate in more than one way (at intersections as well as shared edges of polygons, etc.). In addition, with shared edges the direction can be reversed from one way to the other, so we need a completely separate set of nodes and relationships to model one way versus the other. We have considered a compacted version of the graph where we only use the extra nodes and relationships when they are needed, but the code to decide when they are needed or to convert the subgraph to the expanded version when needed (ie. when a new joined way is loaded) would be much more complex, and therefor susceptible to bugs. We choose a cleaner, simpler code base over a more complex, but more compact graph. Now we also want to model historical changes. It appears that the use of multiple nodes/relationships will also allow us to model this, so it is a good thing (tm) :-) For routing, I would create a set of relationships connecting directly all nodes that are intersection points, and ignoring all the nodes along the way. We can add edge weights to these new relationships for the distance traveled, or other appropriate weighting factors (type of road, possible speed, hinderences, etc.). This graph would be ideal for routing calculations. The main OSM graph is not ideal for routing, but is designed to be a true and accurate reflection of the original OSM data and topology stored in the open street map database. With Neo4j we can do both :-) These routing relationships have not been added to the current OSM model in neo4j-spatial, but would be relatively trivial to add (if we ignore advanced concepts like turning restrictions). They could be added by the OSMImporter code that identifies intersections, with only a few lines of extra code (I think ;-) On 12/6/11, danielb danielbercht...@gmail.com wrote: craig.taverner wrote ... - Create a way-point node for these ... Hi together, I wonder why to add extra nodes to the graph (if I understand Craig correctly)? Wouldn't you then end up in expanding twice the node count (way-point nodes and OSM nodes themself, because you have to query the OSM id (or any other identification value of the end node) in every expand and lat / lon if you don't have precompiled edge weights)? I would just connect the OSM nodes directly with new edges to form a routing subgraph. Best Regards, Daniel -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Neo4j-path-finding-using-OSM-ways-tp3004328p3564688.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Feedback requested: Major wish list item for Neo4J
I definitely second this suggestion. We have recently being working on a binary store for dense data we would like to access as if they were properties of nodes. Right now we have properties that are references to files on disk, and then handle the binary ourselves, but this does not benefit from any transactional advantages. Rick's suggestion of a plugable store would suite us very well, because I presume Neo4j would specify the interface/api to use to implement such a store in a way that could be handled atomically within transactions, and then we could satisfy that with our own store. On Wed, Dec 7, 2011 at 3:43 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: One area where I would love to see the Neo4J team focus some energy is in the efficient storage and retrieval of blob/large text properties. Similar to the indexing strategy in Neo4J, it would be nice if this was pluggable (and it could depend on some other data store more optimized for blob/clob properties). The keys for this to be successful are: - Transacted - Does not store these properties in memory except when accessed (and then, perhaps offer a getPropertyAsStream method and a setPropertyFromStream method for optimal performance) - Transparent - should just work Nice to haves, but not at all required in the first iteration: - Pluggable (store in Neo4J native, filesystem, EC2 simple storage, etc.) Addition of these capabilities would move Neo4J into a dramatically expanded realm of potential applications, some of which are quite mind blowing, both in the social realm and in the enterprise realm. Feedback welcomed! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?
There was only a method ending in 'WithCheck', or something like that, lying unused in the code from last year. Nothing more than that. Except for thinking about it, which is why I wrote the previous mail. On Dec 2, 2011 12:50 PM, Peter Neubauer pe...@neubauer.se wrote: Not sure, Craig, do you have the code somewhere? /peter On Tue, Nov 22, 2011 at 4:17 PM, grimace macegh...@gmail.com wrote: thanks for the response(s)! The hardware I'm testing on is not the best and only 4G of ram so I'm limited, but this seems the best opportunity for me to learn this...that being said... For incremental imports, stitching osm files together, we re-activate the old code that tests the lucene index before adding nodes and relations. There might be some subtle edge cases to consider, but a set of tests with overlapping and non-overlapping osm files should flush them out. I'd love to play with this. Is the old code there for me to re-enable in testing? Or can you point me to where this might be put in? Thx, Greg -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3527995.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] possibility to merge some neo4j databases?
There are two approaches I can think of: - use a better index for mapping ids. Lucent is too slow. Memory hashtables are memory bound.Peter has been investigating alternative dbs like bdb. I tried, but did not finish a hashmap of cached arrays, and Chris wrote his big data import project on github, which is a hashmap of cached hashmaps. Many promising solutions, but none yet complete. All Target the general case of id mapping. - for this specific case, merging small databases, I had an idea a couple of years ago which I still think will work. Bulk appending entire databases, by offsetting all internal ids by the current max id. I remember the reason Johan did not like this idea was that it suffered from the same flaws as the batch inserter, locking the entire db, no rollback and risk of entire db corruption. For people happy with the batch inserter, perhaps this is still an option, but unlikely to get prioritized by the neo team because if the corruption risks. It would, however, perform spectacularly well since the id map is a trivial function. Personally I hope someone completes Chris persistent hashmap or a similar solution. Id maps are a recurring theme and would be very valuable. On Nov 29, 2011 12:07 PM, osallou olivier.sal...@gmail.com wrote: Hi, I need to batch insert millions of data in neo4j. It is quite difficult to keep all in a Map to get node ids, so it needs frequent lookups in index to get some node ids for relationships, and result is quite low. Is there any way to build several neo4j databases (independantly) then to merge them? (I could build many small db in parallel) Thanks Olivier -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/possibility-to-merge-some-neo4j-databases-tp3544694p3544694.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Contributors section in the manual
What is the sort order? Date of first commit, number of lines, commits, packages? On Nov 21, 2011 2:35 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Everyone, have started to put in some random people in, see http://docs.neo4j.org/chunked/snapshot/contributors.html . Any ideas what more info to provide here, or how to make this nicer? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. On Sun, Nov 13, 2011 at 10:42 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: To start with, The manual is for the direct codebase that is part of the distribution. The next step is to include sections and pointers to other stable related projects and drivers. Does that sound reasonable? On Nov 13, 2011 1:36 AM, Nigel Small ni...@nigelsmall.name wrote: Are you looking for info on associated projects like py2neo or direct contributions to the main code base? On a side note, I've been getting quite a few hits to my blog post on pagination in Neo4j. The bits I wrote for that are all Python/py2neo again but that or something similar might be worth including somewhere on the Neo site as it appears to be a reasonably sought-after topic. Cheers *Nigel Small* Phone: +44 7814 638 246 Blog: http://nigelsmall.name/ GTalk: ni...@nigelsmall.name MSN: nasm...@live.co.uk Skype: technige Twitter: @technige https://twitter.com/#!/technige LinkedIn: http://uk.linkedin.com/in/nigelsmall On 12 November 2011 20:40, Peter Neubauer peter.neuba...@neotechnology.comwrote: Hi guys, I would love to add a section on contributors to the Neo4j Manual, in http://docs.neo4j.org/chunked/snapshot/community.html so that all of you that participate in the process can be found in there. Do you have any suggestions on how to present this, that is - what info, links and maybe a short presentation snippets and pictures? Graph to components or simply a table? Thoughts? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] OSMImporter: Is there a way to do incremental imports?
I did some initial work on incremental imports back in 2010, but stopped due to some complications: - We needed to mix lucene reads and writes during the import (read to check if the node already exists, so we don't import twice) and this performs very badly in the batch inserter. We decided to first code a non-batch insert mode before re-starting the incremental import work. Now Peter and I did code a non-batch importer in early 2011, but never went back to complete the incremental import. - We wanted to support both the case of importing multiple OSM files that could be stitched together by resolving overlaps, as well as the case of applying changesets to the existing OSM model. This increased the complexity of the work just enough to ensure it got dropped. In early 2011 we also added support to changesets in the model (but only as a data structure, not in terms of importing changesets). So we are one step closer to this also. Since we now have non-batch importing, and changeset data structures, the opportunity to re-start the incremental import and importing changesets is there. It should not be too hard. For incremental imports, stitching osm files together, we re-activate the old code that tests the lucene index before adding nodes and relations. There might be some subtle edge cases to consider, but a set of tests with overlapping and non-overlapping osm files should flush them out. For applying changesets, more thinking is still required. Do we want to support history in the model, or only the latest version? Should we verify that only newer changesets are applied and in the right order, or rely on the user to get it right? I can say that we did some thinking this summer on the data structures required to support a complete change history. This relies on the fact that we already support multiple possible ways on the same nodes, so we can also, in principle, support multiple possible 'versions' of ways on the same nodes. More thinking is required, but I have a suspicion that we should actually go ahead and do this properly will full history, because that might be the only way to make sure the user never messes things up by importing in the wrong order. On Tue, Nov 22, 2011 at 9:58 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Gregory, incremental loads (and thus, restarts of OSM imports) are a feature we want to add later on, but it's not in there yet. This would also mean we could stitch in other areas on demand, and support submitting changesets back to OSM or at least capture them, so you as an OSM based app can contribute to OSM automagically. I know it's much to ask, but help here would be greatly appreciated. I hope to lab with Michael Hunger on import of data into OSM (and others) this Friday and hope to get somewhere :) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - NOSQL for the Enterprise. http://startupbootcamp.org/- Öresund - Innovation happens HERE. On Tue, Nov 22, 2011 at 7:15 AM, grimace macegh...@gmail.com wrote: I've been playing with OSMImporter; tried batch and native java. I've had mixed success trying to import the planet, but since it's of considerable size, the job usually blows up or grinds to a halt about half way. I think the most I've made it to is 651M nodes and that's not even the ways or relations. I just don't know enough about it and thought I would ask before I try to dive in to it, but what would I have to do to so that I could restart the job ( where it left off ) when it blows? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/OSMImporter-Is-there-a-way-to-do-incremental-imports-tp3526941p3526941.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] osm_import.rb
Hi, Sorry for a late contribution to this discussion. I will try make a few comments to cover the various mails above. Firstly, the neo4j-spatial.rb GEM at version 0.0.8 on RubyGems works with Neo4j-Spatial 0.6, which does include the non-batch inserter code, so in principle should work for you. However, there is a need to change one line of code in the Ruby to make it use the normal graph API instead of the batch inserter. I will commit this change later, but for now you would change line 118 of osm.rb (see https://github.com/craigtaverner/neo4j-spatial.rb/blob/master/lib/neo4j/spatial/osm.rb#L118), to instead look like: #@importer.import_file batch_inserter, @osm_path @importer.import_file normal_database, @osm_path, false, 5000 (basically you replace 'batch_inserter' with 'normal_database' and add the two extra parameters 'false, 5000'). Looking at the errors you are getting, I see they are, as you suspected, related to out of date instructions. I will try get round to updating the instructions soon, but in the meantime: - For using the Ruby Gem, you should use the osm_import command (added automatically to your path when you install the gem). So you can replace the command 'jruby -S examples/osm_import.rb' with just 'osm_import'. - When using the code directly from github, there is a jar missing in the lib/neo4j/spatial/jars directory. This is the neo4j-spatial-0.6-SNAPSHOT.jar, which can be downloaded and copied into that directory manually. The direct link to this file on the m2.neo4j.orgsite is http://m2.neo4j.org/org/neo4j/neo4j-spatial/0.6-SNAPSHOT/neo4j-spatial-0.6-SNAPSHOT.jar Your last comment about 'includePoints' is just a setting for whether or not to use all OSM points as individual geometries or not. The default is false because you normally do not want to be able to search for all points on a long road, but for the road itself. I recommend leaving this as false, unless you have a specific need. Regards, Craig On Thu, Nov 10, 2011 at 2:51 PM, grimace macegh...@gmail.com wrote: I ended up trying again with just java (but still running with batchInserter), adjusting my memory settings and max heap, it's currently working on the americas.osm file from cloudmade - http://downloads.cloudmade.com/americas#downloads_breadcrumbs. The file is about 99 GB when assembled. I'm running on ubuntu 11.10 Core 2 Duo 2.Ghz with 4G ram (not very fast, but what I have available right now), Java Heap -- -Xmx=3072M config settings: neostore.nodestore.db.mapped_memory=1000M neostore.relationshipstore.db.mapped_memory=300M neostore.propertystore.db.mapped_memory=400M neostore.propertystore.db.strings.mapped_memory=800M neostore.propertystore.db.arrays.mapped_memory=100M My code is essentially from the test suite that you suggested but I am using the batchImporter instead. I'm about 1/3 of the way through and don't want to interrupt the process, but when it's done I'll try it without the batch importer. It runs at about 4500 nodes/second. Is that reasonable? I haven't looked at performance numbers from anyone else. Would the non batch performance be better? Is is better to 'includePoints' or not? One questions I had was, once I get this imported via this method ( neo4j embedded ), is it possible to move the imported db to a neo4j server? I'm hoping it is. If so, what would that process be? -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/osm-import-rb-tp3493463p3496760.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j low-level data storage
I think Daniels questions are very relevant, but not just to OSM. Any large graph (of which OSM is simply a good example) will be affected by fragmentation, and that can affect performance. I recently was hit by performance of GIS queries (not OSM) related to fragmentation of the index tree. I will describe that problem below, but first let me describe my view on Daniels question. It is true that if parts of the graph that are geographically close are also close on disk the load time for bounding box queries will be faster. However, this is not a problem that is easy to solve in a generic way, because it requires knowledge of the domain. I can see two ways to create a less fragmented graph: - Have a de-fragmenting algorithm that re-organizes an existing graph according to some rules. This does not exist in neo4j (yet?), but is probably easier to generalize, since it should be possible to first analyse the connectedness of the graph, and then defragment based on that. This means a reasonable solution might be possible without knowing the domain. - Be sure to load domain specific data in the order you wish to query it. In other words, create a graph that is already de-fragmented. This second approach is the route I have started following (at least I've taken one or two tiny baby-steps in that direction, but plan for more). In the case of the OSM model produced by the OSMImporter in Neo4j-Spatial, we do not do much here. We are importing the data in the order it was created in the original postgres database (ie. in the order it was originally added to open street map). However, since the XML format puts ways after all nodes, we actually also store all ways after all nodes, which means that to load any particular way completely from the database requires hitting disk at at least two very different locations, the location of the way node and the interconnects between the nodes, and the location(s) of the original location nodes. This multiple hit will occur on the nodes, relationships and properties tables in a similar way. So I can also answer a question Daniel asked about the ids. The Neo4j nodes, relationships and properties have their own id space. So you can have node 1, relationship 1 and property 1. Lets consider a real example, a street made of 5 points, added early to OSM (so low id's in both postgres and in neo4j). The OSM file will have these nodes near the top, but the way that connects them together will be near the bottom of the file. In Postgres the nodes and ways are in different tables, and will both be near the top. In neo4j both osm-ways and osm-nodes are neo4j-nodes (in the same 'table'). The osm-nodes will have low ids, but the ways will have a high id. Also we use proxy nodes to connect osm-ways to osm-nodes, and these will be created together with the way. So we will have 5 nodes with low ids, and 8 nodes with high id's (5 proxy nodes, 1 way node, 1 geometry node and 1 tags node). If the way was big and/or edited multiple times, we could get even higher fragmentation. Personally I think that fragmenting one geometry into a few specific locations is not a big problem for the neo4j caches. However, when we are talking about a result-set or traversal of thousands or hundreds of thousands of geometries, then doubling or tripling the number of disk hits due to fragmentation can definitely have a big impact. How can this fragmentation situation be improved? One idea is to load the data with two passes. The current loader is trying to optimize OSM import speed, which is difficult already (and slower than in rdbms due to increased explicit structure), and so we have a single pass loader, with a lucene index for reconnecting ways to nodes. However, I think we could change this to a two pass loader, with the first pass reading and indexing the point nodes into a unique id-index (for fast postgres id lookup), and the second pass would connect the ways, and store both the nodes and ways to the database at the same time, in continuous disk space. This would improve query performance, and if we make a good unique-id index faster than lucene, we will actually also improve import speed .. :-) Now all of the above does not answer the original question regarding bounding box queries. All we will have done with this is improve the load time for complete OSM geometries (by reducing geometry fragmentation). But what about the index itself. We are storing the index as part of the graph. Today, Neo4j-spatial uses an RTree index that is created at the end of the load in OSMImporter. This means we load the complete OSM file, and then we index it. This is a good idea because it will store the entire RTree in contiguous disk space. Sort of there is one issue with the RTree node splitting that will cause slight fragmentation, but I think it is not too serious. Now when performing bounding box queries, the main work done by the RTree will hit the minimum amount of disk space, until
Re: [Neo4j] Neo4j in GIS Applications
Hi all, I am certainly behind on my emails, but I did just answer a related question about OSM and fragmentation, and I think that might have answered some of Daniels questions. But I can say a little more about OSM and Neo4j here, specifically about the issue of joins in postgres. Let me start by describing where I think postgres might be faster than neo4j, and then move onto where neo4j is faster than postgres. Importing OSM data into postgres will be faster than neo4j because the foreign keys are simple integer references between tables and are indexed using postgres high performance indexes. In Neo4j the relationships are much more detailed explicit bi-directional references taking more disk space (but no index space). The disk write time is longer (more data written), but the advantages of not having an index make it worth while. So that leads naturally to where neo4j is faster. The reason there is no index on the foreign key is because there is no need for one. Each relationship contains the id of the node it points to (and points from), and that id is directly mapped to the location on disk of the node itself. So this is more like an array lookup, because all nodes are the same size on disk. So the 'join' you perform when traversing from one osm-node to another is extremely fast, but more importantly it is not affected by database size. It is O(1) in performance! Fantastic! In rdbms, the need for an index on the foreign key means you are building a tree structure to get the join down from O(N) to O(ln(N)) or something better, but never as good as O(1). In neo4j-spatial, if you perform a bounding box query, you are traversing an RTree, which does not exist in posgres, but does exist in PostGIS. In both Neo4j-Spatial and PostGIS you are working with a tree index that will slow things down if there is a lot of data, and currently the postgis rtree is better optimized than the neo4j-spatial rtree. But if you are performing more graph-like processing, for example proximity searches, or routing analysis, then you will get the full O(1) benefits of the graph database, and no way can postgres match that :-) OK. Lots of hype, but I get enthusiastic sometimes. Take anything I say with a pinch of salt. Believe the part that make sense to you, and try some tests otherwise. It would be great to hear your experiences with modeling OSM in neo4j versus postgres. Regards, Craig On Tue, Oct 4, 2011 at 7:18 PM, Andreas Kollegger andreas.kolleg...@neotechnology.com wrote: Hi Daniel, If you haven't yet, you should check out the work done in the Neo4j Spatial project - https://github.com/neo4j/spatial - which has fairly comprehensive support for GIS. Data locality, as you mention, is exactly a big advantage of using a graph for geospatial data. Take a look at the Neo4j Spatial project and let us know what you think. Best, Andreas On Tue, Oct 4, 2011 at 9:58 AM, danielb danielbercht...@gmail.com wrote: Hello everyone, I am going to write my master thesis about the suitability of graph databases in GIS applications (at least I hope so^^). The database has to provide topological queries, network analysis and the ability to store large amount of mapdata for viewing - all based on OSM-data of Germany ( 100M nodes). Most likely I will compare Neo4j to PostGIS. As a starting point I want to know why you would recommend Neo4j to do the job? What are the main advantages of a graph database compared to a (object-)relational database in the GIS environment? The main focus and the goal of this work should be to show a performance improvement over relational databases. In a student project (OSM navigation system) we worked with relational (SQLite) and object-oriented (Perst) databases on netbook hardware and embedded systems. The relational database approach showed us two problems: If you transfer the OSM model directly into tables then you have a lot of joins which slows everything down (and lots of redundancy when using different tables for each zoom level). The other way is to store as much as possible in one big (sparse) table. But this would also have some performance issues I guess and from a design perspective it is not a nice solution. The object-oriented database also suffered from many random reads when loading a bounding box. In addition we could not say how data was stored in detail. The performance indeed increased after caching occured or by the use of SSD hardware. You can also store everything in RAM (money does the job), but for now you have to assume that all of the data has to be read from a slow disk the first time. Can Neo4j be configured to read for example a bounding box of OSM data from disk in an efficient way (data locality)? Maybe you also have some suggestions where I should have a look at in this work and what can be improved in Neo4j to get better results. I also would appreciate related papers.
Re: [Neo4j] Problem Installing Spatial (Beginner)
Sorry for such a late response, I missed this mail. I must first point out that it seems you are trying to use Neo4j-Spatial in the standalone server version of Neo4j. That is possible, but not well supported. We have only exposed a few of the functions in the server, and do not test it regularly. The main way we are using neo4j-spatial at the moment is in the embedded version of neo4j. This is where the maven instructions come in because they assume you are writing a Java application that will embed the database. If you are using a java application, and you can start using maven, then everything should be easy to get working. However, since I am relatively sure you are using neo4j-server, I think you are getting into deep water. We need to improve our support for neo4j server more before I can recommend you try it. The next release, 0.7, is focusing on geoprocessing features, and then we hope to expose this in neo4j-server in 0.8. Hopefully then things will be much easier for you. On Tue, Sep 27, 2011 at 5:24 PM, handloomweaver a...@atomised.coop wrote: Hi I wonder if someone would be so kind to help. I'm new to Neo4j and was trying to install Neo4jSpatial to try its GIS features out. I need to be clear that I have no experience of Java Maven so I'm struggling a bit. I want to install Neo4j Spatial once somewhere on my 4GB MacBook Pro. I have no problem downloading the Neo4j Java Binary and starting it. But I'm confused about the Spatial library. Looking at the Github page it says either use Maven or copy a zip file into a folder in Neo4j. Is the zip file the Github repository contents or something else? I've tried the Maven way (mvn install) described on GitHub but I'm firstly confused about if/where Neo4j is being installed (does it install it first, where?) and anyway the install fails. It seems to be a JVM Heap memory problem? Why is it failing. How can I make it not fail. Is it a config file somewhere needing tweaked? http://handloomweaver.s3.amazonaws.com/Terminal_Output.txt http://handloomweaver.s3.amazonaws.com/surefire-reports.zip I'm really keen to use Neo4J spatial but the barrier to entry for the less technical GIS developer is proving too high for me! I'd SO appreciate some help/pointers. I apologise that I am posting such a NOOB question on your forum but I've exhausted Google searches. Thanks -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Problem-Installing-Spatial-Beginner-tp3372924p3372924.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Spatial query with property filter
I can elaborate a little on what Peter says. The DynamicLayer support is indeed the only way to do what you want right now, but I think it is actually quite a good fit for your use case. When defining a dynamic layer you are actually just defining a 'returnable evaluator', which will be applied to the nodes during the RTree spatial search. This means that the primary search is spatial, but for each leaf node (geometry) the dynamic layer query is applied as a filter. If you use CQL for the query, then all geometries are converted into JTS geometry classes for the filter (which adds a little overhead, so if the spatial query is not your limited factor, this can affect performance). If you use JSON for the query, it is applied directly to the graph as a pattern match. So JSON should be faster, but does also require that you know the structure of the graph, which the CQL approach does not. Peters pointer to the TestDynamicLayers class is the best place to start for seeing how to use both CQL and JSON filter syntaxes. On Mon, Aug 29, 2011 at 11:59 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Hi there, well, spatial querying is not something that can be easily stuck into an iterator. If you want more than casual querying, I think you need to use the GeoTools APIs, we provide support for CQL as a query lang there, see https://github.com/neo4j/spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestDynamicLayers.java#L60for some examples. Basically, you define a dynamic layer witha CQL query, which will return the subset of the full layer (e.g. a SimplePointLayer) that matches that query. Would that help? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 29, 2011 at 1:37 AM, faffi obscurredbyclo...@gmail.com wrote: Hey guys, I'm seeing some kind of disconnect between the spatial and the regular graph traversing query. I can't find a way of executing a spatial query like in SimplePointLayer but also providing something like a ReturnEvaluator. My use case is essentially for all nodes within a 10km radius, return all with name foo. Do I actually have to iterate through all the nodes returned by the query in a list and individually check them? Thanks, faffi -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Spatial-query-with-property-filter-tp3291410p3291410.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial and postgis
Or if you want a command line import, try the ruby gem 'neo4j-spatial.rb'. Once installed you can type: osm_import file.shp On Aug 13, 2011 10:33 AM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, with the pgsql2shp tool you can dump your postgis db in a shapefile and you should be able to import it in Neo4j Spatial in the following way: String shpPath = SHP_DIR + File.separator + layerName; ShapefileImporter importer = new ShapefileImporter(graphDb(), new NullListener(), commitInterval); importer.importFile(shpPath, layerName); Best Regards Andreas Am 12.08.2011 11:10, schrieb chen zhao: Hi, I very interested in neo4j spatial . but I do not know how to import the spatial data. My data are stored in postgis. I read the document http://wiki.neo4j.org/content/Spatial_Data_Storage; and http://wiki.neo4j.org/content/Importing_and_Exporting_Spatial_Data,but I yet do not know to to import data from postgis or import shapfiles. Could you provide some detail information? Please advice. zhao ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial and gtype property
Yes. If you have performed a search and now have SpatialDatabaseRecord results, then that is the best method to use. On Thu, Jul 28, 2011 at 6:03 AM, Christopher Schmidt fakod...@googlemail.com wrote: So best is to use SpatialDatabaseRecord.getGeometry()? Christopher On Wed, Jul 27, 2011 at 10:50 PM, Craig Taverner cr...@amanzi.com wrote: Actually we do allow multiple geometry types in the same layer, but some actions, like export to shapely, will fail. We even test for this in TestDynamicLayers. You can use the gtype if you want, but it is specific to some GeometryEncoders, and might change in future releases. It would be better to get the layers geometry encoder and use that. On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Christopher, What do you mean by allowing to use? Yes, these properties are used to store the Geometry Type for a Layer and for geometry nodes. Sadly, you cannot have more than one Geometry in Layers due to the limitations of e.g. the GeoTools stack. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt fakod...@googlemail.com wrote: Hi all, is it allowed to use the gtype-property to get the geometry type numbers? (Which are defined in org.neo4j.gis.spatial.Constants) -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Spatial and gtype property
Actually we do allow multiple geometry types in the same layer, but some actions, like export to shapely, will fail. We even test for this in TestDynamicLayers. You can use the gtype if you want, but it is specific to some GeometryEncoders, and might change in future releases. It would be better to get the layers geometry encoder and use that. On Jul 27, 2011 6:04 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Christopher, What do you mean by allowing to use? Yes, these properties are used to store the Geometry Type for a Layer and for geometry nodes. Sadly, you cannot have more than one Geometry in Layers due to the limitations of e.g. the GeoTools stack. Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jul 27, 2011 at 4:07 AM, Christopher Schmidt fakod...@googlemail.com wrote: Hi all, is it allowed to use the gtype-property to get the geometry type numbers? (Which are defined in org.neo4j.gis.spatial.Constants) -- Christopher twitter: @fakod blog: http://blog.fakod.eu ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How often are Spatial snapshots published?
Interesting that if you look at the github 'blame' for that file (see https://github.com/neo4j/neo4j-spatial/blame/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java), you find that all the findClosestEdges methods where added in October 2010. So if Nolan has a version older than that, then something weird is going on. He must have the very first version from September 2010, which is not compatible with any recent Neo4, Geotools or uDig. When I look at m2.neo4j.org I can see that the latest 0.6-SNAPSHOT is from May. So we do have a problem, but not one that takes us back to last September. Nolan, perhaps your pom.xml refers to an older neo4j-spatial? You should use 0.6-SNAPSHOT. And we will change that again soon (to 0.7) since we are making changes to the geoprocessing and indexing. On Fri, Jul 22, 2011 at 10:04 AM, Anders Nawroth and...@neotechnology.comwrote: Hi! The deployment seems to be broken at the moment, I'll look into that ASAP. /anders 2011-07-22 09:28, Peter Neubauer skrev: Nolan, saftest is to build it yourself from GITHub, I will check the deployment. Is that ok for now? /peter On Fri, Jul 22, 2011 at 3:57 AM, Nolan Darilekno...@thewordnerd.info wrote: I'm looking at the Spatial sources from Git, and am seeing lots of versions of SpatialTopologyUtils.findClosestEdges that don't appear to be in the snapshot I'm downloading. For instance, public static ArrayListPointResult findClosestEdges(Point point, Layer layer) { doesn't appear to be in the snapshot build I have--that or my local cache is borken. Are these snapshots rebuilt regularly? Thanks. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to create a graph database out of a huge dataset?
I'm not sure it's such a good idea to call tx.success() on every iteration of the loop. I suggest call it only in the commit, and after the loop (ie. move it two lines down). Also I think a commit size of 50k it a little large. You're probably not going to see much improvement past 10k. In fact I generally only use 1k myself (but I hear 10k is popular too :-) On Sun, Jul 17, 2011 at 8:53 PM, st3ven st3...@web.de wrote: Hi, thanks for your fast answer. Right now I'm using lucene for 6M authors, but my whole dataset consists of nearly 25M authors. Can i use lucene there also, because I think this getting really slow to check if a user already exists. How can I change my heap memory settings and my memory-map settings, cause I'm using the transactional mode? Cause I think with 25M authors I will get a OutOfMemory Exception. Here is my code that I have already written so far: import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import org.neo4j.graphdb.GraphDatabaseService; import org.neo4j.graphdb.Node; import org.neo4j.graphdb.Relationship; import org.neo4j.graphdb.Transaction; import org.neo4j.graphdb.index.Index; import org.neo4j.graphdb.index.IndexHits; import org.neo4j.graphdb.index.IndexManager; import org.neo4j.kernel.EmbeddedGraphDatabase; public class WikiGraphRegUser { /** * @param args */ public static void main(String[] args) throws IOException { BufferedReader bf = new BufferedReader(new FileReader( E:/wiki0.csv)); WikiGraphRegUser wgru = new WikiGraphRegUser(); wgru.createGraphDatabase(bf); } private String articleName = ; private GraphDatabaseService db; private IndexManager index; private IndexNode authorList; private int transactionCounter = 0; private Node article; private boolean isFirstAuthor = false; private Node author; private Relationship relationship; private int node; private void createGraphDatabase(BufferedReader bf) { db = new EmbeddedGraphDatabase(target/db); index = db.index(); authorList = index.forNodes(Author); String zeile; Transaction tx = db.beginTx(); try { // reads lines of CSV-file while ((zeile = bf.readLine()) != null) { if (transactionCounter++ % 5 == 0) { tx.success(); tx.finish(); tx = db.beginTx(); } // String[] looks like this: Article%;% Timestamp%;% Author String[] artikelinfo = zeile.split(%;% ); if (artikelinfo.length != 3) { System.out.println(ERROR: check CSV); for (int i = 0; i artikelinfo.length; i++) { System.out.println(artikelinfo[i]); } return; } if (articleName == ) { // create Article and connect with ReferenceNode article = createArticle(artikelinfo[0], db.getReferenceNode(), MyRelationshipTypes.ARTICLE); articleName = artikelinfo[0]; isFirstAuthor = true; } else if (!articleName.equals(artikelinfo[0])) { // create Article and connect with ReferenceNode article = createArticle(artikelinfo[0], db.getReferenceNode(), MyRelationshipTypes.ARTICLE); articleName = artikelinfo[0]; isFirstAuthor = true; } // checks if author already exists IndexHitsNode hits = authorList.get(Author, artikelinfo[2]); // if new author if (hits.size() == 0) { if (isFirstAuthor) { // creates author and connects him with an article author = createAndConnectNode(artikelinfo[2], article, MyRelationshipTypes.WROTE, artikelinfo[1]); isFirstAuthor = false; } else { author
Re: [Neo4j] Neo4j Spatial - Keep OSM imports - Use in GeoServer
I am travelling at the moment, so cannot give a long answer, but can suggest you look at the wiki page for neo4j in uDig, because there we have made some updates concerning which jars to use, and that will probably help you get this working. On Jul 12, 2011 10:59 AM, Robin Cura robin.c...@gmail.com wrote: Hi, First of all, thanks a lot to both of you for your answers, I have only been able to try this yesterday, and it released me from lots of troubles. I succeeded editing the Neo4jTestCase.java file in Netbeans, as you told. I've got troubles to install latest JRuby release (needed for neo4j-spatial) within my Ubuntu, so, I'll make this later, but it's really a good thing to know considering the simplicity of use. Creating thoses databases made me realize another problem.In fact, I followed the tutorial about using neo4j db in Geoserver, and it appears that my neo4j plugin for Geoserver doesn't work, as I always get this error when trying to create a new store linking to my neo4j database. My database is a folder named db1 (and db2 for the other one), located in my ~/ folder. In Geoserver, I create a new store and make it link to file:/home/administrateur/db1/neostore.id But each time, I got this errror : Error connecting to Store. There was an error trying to connecto to store neo4jstore. Do you want to save it anyway? Original exception error: Could not acquire data access 'neo4jstore' I tried with my 2 databases, and same problem. It seems those 2 db aren't the problem, as I've been able to open/visualise those in Gephi (using neo4j import plugin). My guess is that my neo4-spatial plugin for Geoserver isn't working properly. The main problem is that, since the tutorial was written, neo4j changed. In the tuto, we have to place some files in geoserver/WEB-INF/lib/ folder : - json-simple-1.1.jar -- No problem, this file is still used - geronimo-jta_1.1_spec-1.1.1.jar -- Same, this is still the version used in neo4j - neo4j-kernel-1.2-1.2.M04.jar -- Replaced this one with my current neo4j kernel jar, neo4j-kernel-1.4.jar - neo4j-index-1.2-1.2.M04.jar - neo4j-spatial.jar-- Replaced this one with the latest build returned by using sudo mvn clean package : neo4j-spatial-0.6-SNAPSHOT.jar My problem is that there is no more neo4j-index file in latest neo4j releases. There is some neo4j-lucene-index files, but 1.4 doesn't seem to use neo4j-index anymore. When I only put neo4j-lucene-index.jar, Geoserver doesn't propose any option to create a Store from Neo4j databases. So, what I did is I used the neo4j-index-1.3-1.3.M01.jar file from previous release of Neo4j : Geoserver proposes to create a Store from a Neo4j db, but I got the error message quoted above. Any idea how I could make this work ? What is the file that replace neo4j-index in Neo4j 1.4 ? I join one of my database, archived, so that one of you with a working neo4j plugin in Geoserver could test it and confirm the problem isn't with the DB. Thanks, Robin Cura 2011/7/9 Craig Taverner cr...@amanzi.com Another option is to run the main method of OSMImport class, which expects command line arguments for database location and OSM file, and will simply import a file once. This is not tested often, so there is a risk things have changed, but it is worth a try. Another, even easier, option in my opinion is the JRuby gem, neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial To get this running, just install JRuby from http://jruby.org, and then install the gem with jruby -S gem install neo4j-spatial and then you will have new console commands like 'import_layer'. If you run 'import_layer mydata.osm', it will import it to a new database, which you can use. See the github page for more information: https://github.com/craigtaverner/neo4j-spatial.rb On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Robin, the database is deleted after each run in Neo4jTestCase.java, @Override @After protected void tearDown() throws Exception { shutdownDatabase(true); super.tearDown(); } if you change to shutdownDatabase(false), the database will not be deleted. In this case, make sure to run just that test in order not to write several tests to the same DB for clarity. mvn test -Dtest=TestDynamicLayers Does that work for you? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote: Hello, First of all, I don't know anything in java, and I'm trying to figure out if neo4j could
Re: [Neo4j] Neo4j Spatial - Keep OSM imports
Another option is to run the main method of OSMImport class, which expects command line arguments for database location and OSM file, and will simply import a file once. This is not tested often, so there is a risk things have changed, but it is worth a try. Another, even easier, option in my opinion is the JRuby gem, neo4j-spatial.rb. See http://rubygems.org/gems/neo4j-spatial To get this running, just install JRuby from http://jruby.org, and then install the gem with jruby -S gem install neo4j-spatial and then you will have new console commands like 'import_layer'. If you run 'import_layer mydata.osm', it will import it to a new database, which you can use. See the github page for more information: https://github.com/craigtaverner/neo4j-spatial.rb On Thu, Jul 7, 2011 at 10:47 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Robin, the database is deleted after each run in Neo4jTestCase.java, @Override @After protected void tearDown() throws Exception { shutdownDatabase(true); super.tearDown(); } if you change to shutdownDatabase(false), the database will not be deleted. In this case, make sure to run just that test in order not to write several tests to the same DB for clarity. mvn test -Dtest=TestDynamicLayers Does that work for you? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jul 5, 2011 at 6:07 PM, Robin Cura robin.c...@gmail.com wrote: Hello, First of all, I don't know anything in java, and I'm trying to figure out if neo4j could be usefull for my projects. If it is, I will of course learn a bit of java so that I can use neo4j in a decent way for my needs. I'd like to use a neo4j spatial database together with GeoServer. For this, I'm following the tutorial here : http://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServer But this paragraph is blocking me : - One option for the database location is a database created using the unit tests in Neo4j Spatial. The rest of this wiki assumes that you ran the TestDynamicLayers unit test which loads an OSM dataset for the city of Malmö in Sweden, and then creates a number of Dynamic Layers (or views) on this data, which we can publish in GeoServer. - If you do use the unit test for the sample database, then the location of the database will be in the target/var/neo4j-db directory of the Neo4j Source code. My problem is I do not succeed keeping those neo4j spatial databases created with the tests : When I run TestDynamicLayers, it builds databases (in target/var/neo4j-db), but as soon as the database is successfully loaded, it deletes it and start importing another database, and so on. My poor understanding of java doesn't help a lot, I tried to edit the .java in Netbeans + Maven, but until then, it doesn't work, all the directories created during the tests are deleted when the test ends. Any idea how I could keep those databases ? Thanks, Robin ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi Boris, I can see the new update method here: https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/server/plugin/SpatialPlugin.java#L138 And the commit for it is here: https://github.com/neo4j/neo4j-spatial/commit/22eaf91957a6265ef1e6923b5da572b75383b83e Hope that helps. Let me know if this works. The REST method is entirely untested, but does wrap code that is tested, so I'm relatively optimistic :-) Regards, Craig On Wed, Jul 6, 2011 at 1:51 AM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, This is awesome! Where is the update method? I can't find the code on github. Thanks! On Sat, Jul 2, 2011 at 6:00 PM, Craig Taverner cr...@amanzi.com wrote: As I understand it, Andreas is working on the much more complex problem of updating OSM geometries. That is more complex because it involves restructuring the connected graph. The case Boris has is much simpler, just modifying the WKT or WKB in the editable layer. In the Java API this is simply to call the GeometryEncoder.encodeGeometry() method, which will modify the geometry in place (ie. replace the old geometry with a new one). However, I do not think it is that simple on the REST interface. I can check, but think we will need a new method for updating geometries. Internally it is trivial to code. So I just added a quick method, called updateGeometryFromWKT, which requires the geometry (in WKT), the existing geometry node-id, and the layer. Give it a try. On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.com wrote: Actually, Andreas Wilhelm is working right now on updating geometries. Sent from my phone. On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote: Wow that's great! I'll try it out asap. This leads to my next question: how do I update the geometry in a layer, rather than add new? What I am thinking of doing is having a multipoint geometery associated with each of my user nodes which will represent their location history. My plan is to add the geometry to a world layer and then associate the returned node with the user. How do I then add new points to that connecter node? Can I just edit the wkt and assume the index will update? Or do you have a better suggestion for doing this? I would rather avoid having each point be a seperate node as I am tracking gps data and getting lots of coordinates, it would be many thousands of nodes per user. Many thanks! On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com wrote: Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next
Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #6
Hi Andreas, Sounds like good progress over all. It is only a week to the mid-terms, so it would be good to do a general code overview and see if this can be integrated with trunk. Shall we plan for a review and test integration in the middle of next week? Regards, Craig On Sat, Jul 2, 2011 at 10:25 AM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, This week I had a little blocker with deleting some subgraph nodes and relations. For that I made a seperate test to identify the problem and try to find a solution. Apart from that I integrated a additonal spatial type function to get the distance between geometry nodes and updated the already existing spatial type functions to the new API. Best Regards Andreas Wilhelm ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] reify links with other neo4j databases located on different distributed servers
As far as I know there is no internal support for transparent traversals across shards. Generally people are doing that in the application layer. However, I think there might be a middle ground of sorts. I we modify the relationship expander, I could imagine that relationships that are between shards could be modified to return node on the other shard. This would make the traversal return nodes across shards, but since I've not tried this myself, I am uncertain if there are other consequences. On Sat, Jul 2, 2011 at 4:03 AM, Aliabbas Petiwala aliabba...@gmail.comwrote: Hi, I cannot figure out how my application logic can reify links with other neo4j databases located on different distributed servers? hence , how can i make the traversals and graph algorithms transparent to the location of the different databases ? -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] wkb value in node created by addGeometryWKTToLayer
Hi Boris, You do not need to read the property yourself from the node, rather use the GeometryEncoder for this, it converts from the internal spatial storage to the Geometry class, which you can work with. If you call geom.toString() you will get a nice printable version (in WKT). Using the GeometryEncoder is a particularly good idea because we support many internal storage formats, not just the WKB you found. If you have point data only, you should consider using the SimplePointLayer (created with SpatialDatabaseService.createSimplePointLayer()), which will store the Point as two properties, for latitude and longitude. Back to your main question: WKB and WKT are two different formats for representing spatial data. We support both with the WKBGeometryEncoder and WKTGeometryEncoder classes, but in both cases we convert from that format to JTS Geometry class for performing spatial operations on. Internally these classes use the WKBReader/WKBWriter (and WKT versions of this) for performing the conversions. If you want to convert between WKB and WKT yourself, you should just use the JTS code directly. But as I said before, I do not think you need to do this. If you are getting your nodes from a search using the index, something like search.getResults().get(0).getGeometry().toString() will return the WKT version. Regards, Craig On Sat, Jul 2, 2011 at 1:04 AM, Boris Kizelshteyn bo...@popcha.com wrote: Craig or anyone who can answer this: what does the wkb value represent here. I know its the well known bytes, but how do I get back to wkt? I thought it was a byte array, but I can't seem to get my original values back. Form the values in the test case I have: POINT(15.2 60.1) wkb: [0,0,0,0,2,0,0,0,2,64,46,51,51,51,51,51,51,64,78,25,-103,-103,-103,-103,-102,64,46,-103,-103,-103,-103,-103,-102,64,78,12,-52,-52,-52,-52,-51] ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static in nature, and you are analysing only what the person did in the past, and the graph will therefor not change, perhaps you do not care to store the locations in the graph, and you can just import them as a LineString directly into a standard layer. Whatever route you take, the final action you want to perform is to find points near the LineString (path the person took). I do not think the bounding box is the right approach for that either. You need to try, for example, the method findClosestEdges in the utilities class at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 This method can find the part of the persons path that it closest to the point of interest. There also also many other geographic operations you might be interested in trying, once you have a better feel for the types of queries you want to ask. Regards, Craig On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks for the detailed response! Here is what I'm trying to do and I'm still not sure how to accomplish it: 1. I have a node which is a person 2. I have geo data as that person moves around the world 3. I use the geodata to create a bounding box of where that person has been today 4. I want to say, was this person A near location X today? 5. I do this by seeing if location X is in A's bounding box. From looking at what you suggest doing, it's not clear how I assign the node person A to a layer? Is it that the bounding box is now in the layer and not in the node? The issue then becomes, how od I associate the two as the RTree relationship seems to establish itself on the bounding box between the node and the layer. Many thanks for your patience as I learn this challenging material
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
As I understand it, Andreas is working on the much more complex problem of updating OSM geometries. That is more complex because it involves restructuring the connected graph. The case Boris has is much simpler, just modifying the WKT or WKB in the editable layer. In the Java API this is simply to call the GeometryEncoder.encodeGeometry() method, which will modify the geometry in place (ie. replace the old geometry with a new one). However, I do not think it is that simple on the REST interface. I can check, but think we will need a new method for updating geometries. Internally it is trivial to code. So I just added a quick method, called updateGeometryFromWKT, which requires the geometry (in WKT), the existing geometry node-id, and the layer. Give it a try. On Sat, Jul 2, 2011 at 5:10 PM, Peter Neubauer neubauer.pe...@gmail.comwrote: Actually, Andreas Wilhelm is working right now on updating geometries. Sent from my phone. On Jul 2, 2011 5:00 PM, Boris Kizelshteyn bo...@popcha.com wrote: Wow that's great! I'll try it out asap. This leads to my next question: how do I update the geometry in a layer, rather than add new? What I am thinking of doing is having a multipoint geometery associated with each of my user nodes which will represent their location history. My plan is to add the geometry to a world layer and then associate the returned node with the user. How do I then add new points to that connecter node? Can I just edit the wkt and assume the index will update? Or do you have a better suggestion for doing this? I would rather avoid having each point be a seperate node as I am tracking gps data and getting lots of coordinates, it would be many thousands of nodes per user. Many thanks! On Sat, Jul 2, 2011 at 6:48 AM, Craig Taverner cr...@amanzi.com wrote: Hi Boris, Ah! You are using the REST API. That changes a lot, since Neo4j Spatial is only recently exposed in REST and we do not expose most of the capabilities I have discussed in this thread, or indeed in my other answer today. I did recently add some REST methods that might work for you, specifically the addEditableLayer, which makes a WKB layer, and the addGeometryWKTToLayer, for adding any kind of Geometry (including LineString) to the layer. However, these were only added recently, and I have no experience using them myself, so consider this very much prototype code. From your other question today, can I assume you are having trouble making sense of the data coming back? So we need a better way to return the results in WKT instead of WKB? One option would be to enhance the addEditableLayer method to allow the creation of WKT layers instead of WKB layers, so the internal representation is more internet friendly. I've just added untested support for setting the format to WKT for the internal representation of the editable layer in the REST interface. This is untested (outside of my usual unit tests, that is), and is only in the trunk of neo4j-spatial, but you are welcome to try it out and see what happens. Regards, Craig On Fri, Jul 1, 2011 at 5:29 PM, Boris Kizelshteyn bo...@popcha.com wrote: Hi Craig, Thanks so much for this reply. It is very insightful. Is it possible for me to implement the LineString geometries and lookups using REST? Many thanks! On Wed, Jun 8, 2011 at 4:58 PM, Craig Taverner cr...@amanzi.com wrote: OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static
[Neo4j] Cypher error in neo4j-spatial
Hi, Recent builds of Neo4j-Spatial no longer like Peters new bounding box query. Peter is on vacation, and I am not familiar with the code (nor cypher), so I thought I would just dump the error message here for now in case someone can give me a quick pointer. The line of code is: Query query = parser.parse( start n=(layer1,'bbox:[15.0, 16.0, 56.0, 57.0]') match (n) -[r] - (x) return n.bbox, r:TYPE, x.layer?, x.bbox? ); The error is: org.neo4j.cypher.SyntaxError: string matching regex `\z' expected but `:' found at org.neo4j.cypher.parser.CypherParser.parse(CypherParser.scala:75) at org.neo4j.cypher.javacompat.CypherParser.parse(CypherParser.java:39) at org.neo4j.gis.spatial.IndexProviderTest.testNodeIndex(IndexProviderTest.java:91) Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] traversing densely populated nodes
This topics has come up before, and the domain level solutions are usually very similar, like Norbert's category/proxy nodes (to group by type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can build a generic user-level solution that can also be wrapped to appear as an internal database solution? For example, consider Niels's solution of the TimeLine index. In this case we group all the nodes based on a consistent hash. Usually the timeline would use a timestamp, but really any reasonably variable property can do, even the node-id itself. Then we have a BTree between the dense nodes and the root node (node with too many relationships). How about this crazy idea, create an API that mimics the normal node.getRelationship*() API, but internally traverses the entire tree? And also for creating the relationships? So for most cod we just do the usual node.createRelationshipTo(node,type,direction) and node.traverse(...), but internally we actually traverse the b-tree. This would solve the performance bottleneck being observed while keeping the 'illusion' of directly connected relationships. The solution would be implemented mostly in the application space, so will not need any changes to the core database. I see this as being of the same kind of solution as the auto-indexing. We setup some initial configuration that results in certain structures being created on demand. With auto-indexing we are talking about mostly automatically adding lucene indexes. With this idea we are talking about automatically replacing direct relationships with b-trees to resolve a specific performance issue. And when the relationship density is very low, if the b-tree is auto-balancing, it could just be a direct relationship anyway. On Wed, Jun 29, 2011 at 6:56 PM, Agelos Pikoulas agelos.pikou...@gmail.comwrote: My problem pattern is exactly the same as Niels's : A dense-node has millions of relations of a certain direction type, and only a few (sparse) relations of a different direction and type. The traversing is usually following only those sparse relationships on those dense-nodes. Now, even when traversing on these sparse relations, neo4j becomes extremely slow on a certainly non linear Order (the big cs O). Some tests I run (email me if u want the code) reveal that even the number of those dense-nodes in the database greatly influences the results. I just reported to Michael the runs with the latest M05 snapshot, which are not very positive... I have suggested an (auto) indexing of relationship types / direction that is used by traversing frameworks, but I ain't no graphdb-engine expert :-( A' Message: 5 Date: Wed, 29 Jun 2011 18:19:10 +0200 From: Niels Hoogeveen pd_aficion...@hotmail.com Subject: Re: [Neo4j] traversing densely populated nodes To: user@lists.neo4j.org Message-ID: col110-w326b152552b8f7fbe1312d8b...@phx.gbl Content-Type: text/plain; charset=iso-8859-1 Michael, The issue I am refering to does not pertain to traversing many relations at once but the impact many relationship of one type have on relationships of another type on the same node. Example: A topic class has 2 million outgoing relationships of type HAS_INSTANCE and has 3 outgoing relationships of type SUB_CLASS_OF. Fetching the 3 relations of type SUB_CLASS_OF takes very long, I presume due to the presence of the 2 million other relationships. I have no need to ever fetch the HAS_INSTANCE relationships from the topic node. That relation is always traversed from the other direction. I do want to know the class of a topic instance, leading to he topic class, but have no real interest ever to traverse all topic instance from the topic class (at least not directly.. i do want to know the most recent addition, and that's what I use the timeline index for). Niels From: michael.hun...@neotechnology.com Date: Wed, 29 Jun 2011 17:50:08 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] traversing densely populated nodes I think this is the same problem that Angelos is facing, we are currently evaluating options to improve the performance on those highly connected supernodes. A traditional option is really to split them into group or even kind of shard their relationships to a second layer. We're looking into storage improvement options as well as modifications to retrieval of that many relationships at once. Cheers Michael ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Database engine using Neo4j
Hi Kriti, I can comment on a few things, especially neo4j-spatial: - Neo4j is certainly good for social networks, and people have used it for that, but I personally do not have experience with that so I will not comment further (others can chip in where necessary). - Neo4j-Spatial is good for performing some spatial queries on your domain data. So you start by modeling your domain however you want, and then when you want to start using neo4j-spatial, just add all nodes that have spatial components (eg. location) to the spatial index and they will be available for querying. The SimplePointLayer class has support for querying by proximity, which sounds like what you want. You can also query with a filter on properties (so only nearby objects matching some other criteria). - I do my neo4j-spatial development in eclipse, so there should be no issues for you using eclipse. Just use m2eclipse, and add the dependency to your pom.xml. The current version o neo4j-spatial requires neo4j1.4, so if you are using older neo4j, you might need to make minor changes. - Neo4j is not optimized for storing BLOBs, so while it can store images as byte[], it is advisable to rather store a reference to the image (eg. URI), and store the image in another way (filesystem, other database, etc.) Regards, Craig On Wed, Jun 29, 2011 at 2:06 PM, kriti sharma kriti.0...@gmail.com wrote: Dear Users, I am developing a time capsule DB engine using Neo4j as a database. I intend to develop three scales (temporal , geo/spatial and egocentric/personal relationships) in the db structure. for the geolocation part, i would like to be able to query upon a location keyword and also some nearby places/photos/people that i have in my DB. Do you think neo4j spatial will be a good choice for such a spatial scheme? I have developed a timeline in the usual neo4j using timeline feature. Can I simply integrate neo4j spatial in my existing code for neo4j in eclipse? i am retrieving data from twitter, flickr, facebook etc. so the format of data may not be uniform. Therefore i found Neo4j to be an excellent option. Has some work been done in modelling a user's Facebook data(friends and networks) relationships in Neo4j? How should I go about storing images in the DB? Thanks Kriti ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] traversing densely populated nodes
In the amanzi-index I link all indexed nodes into the index tree, so traversals are straight up the tree. Of course this also means that there are at least as many relationships as indexed nodes. I was reviewing Michaels code for the relationship expander, and think that is a great idea, tranparently using an index instead of the normal relationships API, and can imagine using the relationship expander to instead traverse the BTree to the final relationship to the leaf nodes. So if we imagine a BTree with perhaps 10 or 20 hops from the root to the leaf node, the relationship expander Michael described would complete all hops and return only the last relationship, giving the illusion of direct connections from root to leaf. This would certainly perform well, especially for cases where there are factors limiting the number of relationships we want returned. I think the request for type and direction is the first obvious case, but we could be even more explicit than that, if we pass constraints based on the BTree's consistent hash. On Thu, Jun 30, 2011 at 11:36 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: In theory the approach I described earlier could work, though there are some pitfalls to the current implementation that need ironing out before this can become a recommended approach. The choice of Timeline instead of Btree may actually be the wrong choice after all. I chose Timeline because of my familiarity with this particular class, but its implementation may actually not be all that suitable for this particular use case. This has to do with the fact that Timeline is not just a tree, but a list where entries with an interval of max. 1000 are stored in a Btree index. This works reasonably well for a Timeline, but makes the approach less ideal for storing dense relationships. The problem with the Timeline implementation is the ability to lookup the tree root from a particular leave. In an ordinary Btree is would simply be a traversal from the leave through the layers of block nodes to the tree root. In Timeline the traversal will be different. It first has to move through the Timeline list until it finds an entry that is stored in the Btree (which worst case takes 1000 hops), and then it has to traverse the Btree up to the tree root. To avoid this complicated traversal I ended up doing a lookup through Lucene of the timeline URI (which is stored in all timeline list entries). In fact I might as well have added the URI of the dense node as a property and do the lookup through Lucene without the Timeline, it just happens that I like the sort order of Timeline, making it a useful approach anyway. I will experiment using Btree directly (without Timeline) and see if that leads to a simpler and faster traversal from leave to root node. There is one more issue before this can become production ready. Btree as it is implemented now is not thread safe (per the implementations Javadocs), so it need some love and attention to make it work properly. Niels Date: Thu, 30 Jun 2011 13:57:20 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] traversing densely populated nodes This topics has come up before, and the domain level solutions are usually very similar, like Norbert's category/proxy nodes (to group by type/direction) and Niels' TimeLineIndex (BTree). I wonder whether we can build a generic user-level solution that can also be wrapped to appear as an internal database solution? For example, consider Niels's solution of the TimeLine index. In this case we group all the nodes based on a consistent hash. Usually the timeline would use a timestamp, but really any reasonably variable property can do, even the node-id itself. Then we have a BTree between the dense nodes and the root node (node with too many relationships). How about this crazy idea, create an API that mimics the normal node.getRelationship*() API, but internally traverses the entire tree? And also for creating the relationships? So for most cod we just do the usual node.createRelationshipTo(node,type,direction) and node.traverse(...), but internally we actually traverse the b-tree. This would solve the performance bottleneck being observed while keeping the 'illusion' of directly connected relationships. The solution would be implemented mostly in the application space, so will not need any changes to the core database. I see this as being of the same kind of solution as the auto-indexing. We setup some initial configuration that results in certain structures being created on demand. With auto-indexing we are talking about mostly automatically adding lucene indexes. With this idea we are talking about automatically replacing direct relationships with b-trees to resolve a specific performance issue. And when the relationship density is very low, if the b-tree is auto-balancing, it could just be a direct
Re: [Neo4j] neo4j-graph-collections
I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both (Btree,Rtree) can implement, for the data types I'd like to understand the use cases first before implementing the different data types, maybe we could store types of Object instead of Long or Double and implement comparators in a more meaningful fashion. Also I was wondering if unit tests would need to be extracted out of the spatial component and embedded inside the graph-collections component as well or whether we'd potentially need to write brand new unit tests as well. Craig as I mentioned I'd love to help, let me know if it would be possible to fork a repo or to talk in more detail this week. Regards From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 29 Jun 2011 01:35:43 +0200 Subject: Re: [Neo4j] neo4j-graph-collections As to the issue of n-dim doubles, it would be interesting to consider creating a set of classes of type Orderable (supporting , =, , = operations), this we can use in both Rtree and Btree. Right now Btree only supports datatype Long. This should also become more generic. A first step we can take is at least wrap the common datatypes in Orderable classes. Niels Date: Wed, 29 Jun 2011 00:32:15 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections The RTree in principle should be generalizable, but the current implementation in neo4j-spatial does make a few assumptions specific to spatial data, and makes use of spatial envelopes for the tree node bounding boxes. It is also specific to 2D. We could make a few improvements first, like generalizing to n-dimensions, replacing the recursive search with a traverser and generalizing the bounding boxes to be simple double-arrays. Then the only thing left would be to decide if it is ok for it to be based on n-dim doubles or should be generalized to more types. On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote: I would be interested in helping out with this, let me know next steps. Sent from my iPhone On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: A couple of weeks ago Peter Neubauer set up a repository for in-graph
Re: [Neo4j] neo4j-graph-collections
It is technically possible, but it is a somewhat specialized index, not a normal BTree, so I think you would want both (mine and a classic btree). My index performs better for certain data patterns, is best with semi-ordered data and moderately even distributions (since it has no rebalancing), and requires the developer to pick a good starting 'resolution' which means they should know something about their data. Perhaps we just port some of the typing support into a btree in the collections project? On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Craig, Would it be possible to merge your work on Amanzi with the work the Neo team has done on the Btree component that is now in neo4j-graph-collections, so we can eventually have one implementation that meets a broad variety of needs? Niels Date: Wed, 29 Jun 2011 15:34:47 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both (Btree,Rtree) can implement, for the data types I'd like to understand the use cases first before implementing the different data types, maybe we could store types of Object instead of Long or Double and implement comparators in a more meaningful fashion. Also I was wondering if unit tests would need to be extracted out of the spatial component and embedded inside the graph-collections component as well or whether we'd potentially need to write brand new unit tests as well. Craig as I mentioned I'd love to help, let me know if it would be possible to fork a repo or to talk in more detail this week. Regards From: pd_aficion...@hotmail.com To: user@lists.neo4j.org Date: Wed, 29 Jun 2011 01:35:43 +0200 Subject: Re: [Neo4j] neo4j-graph-collections As to the issue of n-dim doubles, it would be interesting to consider creating a set of classes of type Orderable (supporting , =, , = operations), this we can use in both Rtree and Btree. Right now Btree only supports datatype Long. This should also become more generic. A first step we can take is at least wrap the common datatypes in Orderable classes. Niels
Re: [Neo4j] neo4j-graph-collections
I think moving the RTree to the generic collections would not be too hard. I saw Saikat showed interested in doing this himself. Saikat, contact me off-list for further details on what I think could be done to make this port. On Wed, Jun 29, 2011 at 9:52 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Peter, I totally agree. Having the Rtree index removed of spatial dependencies in graph-collections should be our first priority. Once that is done we can focus on the other issues. Which doesn't mean we should stop discussing future improvements like setting up comparators (or something to that extent) that can be reusable, but we shouldn't try to get that up before Rtree is in graph-collections. Niels From: peter.neuba...@neotechnology.com Date: Wed, 29 Jun 2011 21:10:15 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections Craig, just gave you push access to the graph collections in case you want to do anything there. Also, IMHO it would be more important to isolate and split out the RTree component from Spatial than to optimize it - that could be done in the new place with targeted performance tests later? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Jun 29, 2011 at 4:19 PM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: Craig, Would it be possible to merge your work on Amanzi with the work the Neo team has done on the Btree component that is now in neo4j-graph-collections, so we can eventually have one implementation that meets a broad variety of needs? Niels Date: Wed, 29 Jun 2011 15:34:47 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-graph-collections I have previously used two solutions to deal with multiple types in btrees: - My first index in 2009 was a btree-like n-dim index using generics to support int[], long[], float[] and double[] (no strings). I used this for TimeLine (long[1]) and Location (double[2]). The knowledge about what type was used was in the code for constructing the index (whether a new index or accessing an existing index in the graph). - In December I started my amanzi-index (on githubhttps://github.com/craigtaverner/amanzi-index) that is also btree-like, n-dimensional. But this time it can index multiple types in the same tree (so a float, int and string in the same tree, instead of being forced to have all properties of the same type). It is a re-write of the previous index to support Strings, and mixed types. This time it does save the type information in meta-data at the tree root. The idea of using a 'comparator' class for the types is similar, but simpler than the idea I implemented for amanzi-index, where I have mapper classes that describe not only how to compare types, but also how to map from values to index keys and back. This includes (to some extent) the concept of the lucene analyser, since the mapper can decide on custom distribution of, for example, strings and category indexes. For both of these indexes, you configure the index up front, and then only call index.add(node) to index a node. This will fit in well with the new auto-indexing ideas in neo4j. On Wed, Jun 29, 2011 at 2:25 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: At this moment Btree only supports the primitive datatype long, while Rtree only supports the datatype double. For Btree it makes sense to at least support strings, floats, doubles and ints too. Use cases for these data types are pretty obvious and are Btree backed in (almost) every RDBMS product around.I think the best solution would be to create Comparator objects wrapping these primitive data types and store the class name of the comparator in root of the index tree. This allows users to create their own comparators for datatypes not covered yet. It would make sense people would want to store BigInt and BigDecimal objects in a Btree too, others may want to store dates (instead of datetime), fractions, complex numbers or even more exotic data types. Niels From: sxk1...@hotmail.com To: user@lists.neo4j.org Date: Tue, 28 Jun 2011 22:43:24 -0700 Subject: Re: [Neo4j] neo4j-graph-collections I've read through this thread in more detail and have a few thoughts, when you talk about type I am assuming that you are referring to an interface that both
Re: [Neo4j] neo4j-graph-collections
The RTree in principle should be generalizable, but the current implementation in neo4j-spatial does make a few assumptions specific to spatial data, and makes use of spatial envelopes for the tree node bounding boxes. It is also specific to 2D. We could make a few improvements first, like generalizing to n-dimensions, replacing the recursive search with a traverser and generalizing the bounding boxes to be simple double-arrays. Then the only thing left would be to decide if it is ok for it to be based on n-dim doubles or should be generalized to more types. On Tue, Jun 28, 2011 at 11:14 PM, Saikat Kanjilal sxk1...@hotmail.comwrote: I would be interested in helping out with this, let me know next steps. Sent from my iPhone On Jun 28, 2011, at 8:49 AM, Niels Hoogeveen pd_aficion...@hotmail.com wrote: A couple of weeks ago Peter Neubauer set up a repository for in-graph datastructures: https://github.com/peterneubauer/graph-collections. At this time of writing only the Btree/Timeline index is part of this component. In my opinion it would be interesting to move the Rtree parts of neo-spatial to neo4j-graph-collections too. I looked at the code but don't feel competent to seperate out those classes that support generic Rtrees from those classes that are clearly spatial related. Is there any enthusiasm for such a project and if so, who is willing and able to do this? Niels ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] cassandra + neo4j graph
Hi, I can comment on the spatial side. The neo4j-spatialhttps://github.com/neo4j/neo4j-spatiallibrary provides some tools for doing spatial analysis on your data. I do not know exactly what you plan to do, but since you mention user and place locations, I guess you are likely to be asking the database for proximity searches (users near me, or places of interest near me), in which case the SimplePointLayerhttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SimplePointLayer.javaclass should provide you what you need. Read the code (linked above), it is simple. Or read the related blog Neo4j Spatial, Part1: Finding things close to other thingshttp://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html. You also do not need to include neo4j-spatial from the beginning. Just model your graph in a way suiting your domain, and when you want to enable spatial searches, include neo4j-spatial dependencies in your pom and start using it. If you happen to conform to one of the expected spatial structures, you can add you nodes to the spatial index directly, otherwise implement a GeometryEncoder and things should work from there. What I think you might find interesting is that you can edit the search mechanism to filter on both spatial and domain specific characteristics in the same pass. There are various options for this, so we can discuss that later, should you wish. Regards, Craig On Mon, Jun 27, 2011 at 3:49 PM, Aliabbas Petiwala aliabba...@gmail.comwrote: thanks for the informative reply , to add more , the social networking website will be geo aware and some spatial info also needs to be stored like the coordinates of the user node or the coordinates of the location\place how can we add more also will neo4j alone + spatial suffice ? can there be multiple masters for load balancing and how about splitting the graph in the design itself like designing in terms of multiple graphs which are mapped to a glue graph? hats off for building such a pioneering technology! regards, Aliabbas On 6/26/11, Jim Webber j...@neotechnology.com wrote: Hi Aliabbas, It's difficult to make pronouncements about your solution design without knowing about it, but here are some heuristics that can help you to plan whether you go with a native Neo4j solution or mix it up with other stores. All of these are only ideas and you should test first to ensure they make sense in your domain. 1. Document/record size. If each node is likely to contain a lot of data (e.g. many megabytes) then you may choose to hold that outside of Neo4j (e.g. file system, KV store). Otherwise Neo4j. 2. Length of individual fields. If they're small enough to fit within our short-string parameters (optimised around post codes, telephone numbers etc) then you get a performance boost compared to longer strings (which live in a separate store file in Neo4j). If your individual fields are really really long (See above, many megabytes), then consider moving them outside Neo4j. If you can slice up your fields into shorter strings then you'll get a good performance and footprint boost. 3. Many machines. Neo4j has master/slave replication so write performance is asymptotically limited by the IO performance of the master (while reads scale horizontally, pretty much). The number of nodes you have is not a problem for Neo4j, so what is critical is whether a single master can handle the write load you want to throw at it. Since modern buses are fast, and since graph data structures are often less write-heavy than equivalents in other data stores*, I'd suggest that you might be well served by Neo4j here. But my overriding advice is to spike something with Neo4j and then, only if you find something that doesn't work in your context, to think about adding another data store. Jim * I'll be blogging about this shortly since it's a common enough misconception that 1000 writes in a relational/other NOSQL database implies 1000 writes in a graph, whereas often it's a single write meaning graphs can be 1000 times better for the same workload. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Aliabbas Petiwala M.Tech CSE ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Recent slowdown in imports with lucene
Sorry for the lack of details. I wrote the email late at night, as I am again. Anyway, the relevant code in github is OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java. When adding nodes to the graph, it also adds the osm-id to a lucene index. There is no index#removal call, only multiple index#add calls within the same transaction. In fact we call index.add and index.get for one index (osm changesets), while calling index.add on another (osm-nodes). The relevant lines of code are 812 for adding new OSM nodes to the graph, and 914 for finding changesets in a different index. I have not investigated for which version of neo4j the slowdown started, or if there is somehow some other cause. I will try find time to do that later this week. But I thought I should ask on the list anyway in case anyone else has a similar problem, or if there are some obvious answers. On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson matt...@neotechnology.comwrote: Please elaborate on how you are using your index. Are you using Index#remove(entity,key) or Index#remove(entity) followed by get/query in the same tx? There was a recent change in transactional state implementation, where a full representation (in-memory lucene index) was needed for it to be able to return accurate results in some corner cases. That change could slow things down, but not that much though. I'll give some different scenarios a go and see if I can find some culprit for this. But again, a little more information would be useful, as always. 2011/6/26 Craig Taverner cr...@amanzi.com Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Recent slowdown in imports with lucene
Hi again, My apologies, but I have found the problem, and it is in the OSMImporter itself, nothing to do with Lucene or Neo4j. Peter made a commithttps://github.com/neo4j/neo4j-spatial/commit/b5e0f1d1a11ed9c8b2b8074f529362a1607a7643#src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.javain May that while at first glance appears to be a cleanup of my code (removal of string literals), it did have two meaningful changes I only saw on deeper inspection: - Addition of the map type: exact to the index creating (when I removed this, node creation improved from 70/s to 140/s) - User control over the commit size (previously I had hard-coded this to 5000 nodes per tx). There was a small, but significant bug in the commit size, with the new user parameter not being used to initialize anything, with the consequence that every node was committed individually. Setting the block size back to 5000 increased the node creation rate to nearly 1 (over 100 times faster). That is a serious improvement. Sorry again for wasting space on the list. I'm glad this was a user error, though, not a neo4j issue :-) Regards, Craig On Mon, Jun 27, 2011 at 12:54 AM, Craig Taverner cr...@amanzi.com wrote: Sorry for the lack of details. I wrote the email late at night, as I am again. Anyway, the relevant code in github is OSMImporter.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/osm/OSMImporter.java. When adding nodes to the graph, it also adds the osm-id to a lucene index. There is no index#removal call, only multiple index#add calls within the same transaction. In fact we call index.add and index.get for one index (osm changesets), while calling index.add on another (osm-nodes). The relevant lines of code are 812 for adding new OSM nodes to the graph, and 914 for finding changesets in a different index. I have not investigated for which version of neo4j the slowdown started, or if there is somehow some other cause. I will try find time to do that later this week. But I thought I should ask on the list anyway in case anyone else has a similar problem, or if there are some obvious answers. On Sun, Jun 26, 2011 at 1:45 PM, Mattias Persson matt...@neotechnology.com wrote: Please elaborate on how you are using your index. Are you using Index#remove(entity,key) or Index#remove(entity) followed by get/query in the same tx? There was a recent change in transactional state implementation, where a full representation (in-memory lucene index) was needed for it to be able to return accurate results in some corner cases. That change could slow things down, but not that much though. I'll give some different scenarios a go and see if I can find some culprit for this. But again, a little more information would be useful, as always. 2011/6/26 Craig Taverner cr...@amanzi.com Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Recent slowdown in imports with lucene
Hi, Has anyone noticed a slowdown of imports into neo4j with recent snapshots? Neo4j-spatial importing OSM data (which uses lucene to find matching nodes for ways) is suddenly running much slower than usual on non-batch imports. For most of my medium sized test cases, I normally have surprisingly similar import times for batch inserter and non-batch inserter (EmbeddedGraphDatabase) versions of the OSM import, but in recent runs the normal API is now more than 10 times slower. Down to 70 nodes per second, which is insanely slow. Any idea if there is something in the recent snapshots for me to look into? Reproducing the problem requires simply running the TestOSMImport test cases in neo4j-spatial. I have only tried this on my laptop, so I have not ruled out that there is something local going on. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j -- Can it be embedded in Android?
I heard that Peter Neubauer made a port of neo4j to android a few years ago, but that nothing has been done since and no version since then would work. So my understanding is that it does not work on android, but that it is possible to make it work (with some work ;-). Peter is away, but I expect he would have a better answer than me. On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com wrote: Dear All, I have googled for this on the web and did not arrive at a satisfactory answer. *Question: Is it possible to run Neo4j on Android? * Thanks, Sidharth -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j -- Can it be embedded in Android?
Personally what I would like to see would be a sub-graph approach, with the android device storing a sub-graph of the main database, and updating that asynchronously with the server. Seems like something that can be done in a domain specific way, but much harder to do generically. I wanted this for OSM, with the local OSM graph on the android device representing a local map supporting fast LBS services, and automatically updating from the main OSM graph on a big central server as the user travels. On Fri, Jun 24, 2011 at 2:56 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: I think the limited capabilities of the Android device(s) (RAM, primarily) limit the usefulness of Neo4J versus alternatives since the datasets are usually small and simple in mobile apps. If we need any heavy-duty graph work for a mobile app, we'd do it on the server. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Sidharth Kshatriya Sent: Friday, June 24, 2011 8:53 AM To: Neo4j user discussions Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android? Yes, I saw that on the mailing list archives too. I would have though there would be some interest in using this on android -- but there seems to be no news about it since... On Fri, Jun 24, 2011 at 6:13 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: I remember something like that, too. The main issue is probably the non-traditional file system that Android exposes. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Craig Taverner Sent: Friday, June 24, 2011 8:37 AM To: Neo4j user discussions Subject: Re: [Neo4j] Neo4j -- Can it be embedded in Android? I heard that Peter Neubauer made a port of neo4j to android a few years ago, but that nothing has been done since and no version since then would work. So my understanding is that it does not work on android, but that it is possible to make it work (with some work ;-). Peter is away, but I expect he would have a better answer than me. On Fri, Jun 24, 2011 at 1:33 PM, Sidharth Kshatriya sid.kshatr...@gmail.com wrote: Dear All, I have googled for this on the web and did not arrive at a satisfactory answer. *Question: Is it possible to run Neo4j on Android? * Thanks, Sidharth -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Sidharth Kshatriya www.sidk.info ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j-spatial roadmap/stability
Hi Christopher, Thanks for your interest in neo4j and neo4j-spatial. I will answer your questions and comments inline. I am working for the largest German speaking travel and holiday portal. Currently we are using a relatively simple MySQL based spatial distance functionality. We plan to enhance this by something which is capable of a flexible set of spatial queries. We will evaluate Neo4j-Spatial for that and benchmark it against PostGIS/PostGreSQL. This would be a very interesting application for neo4j-spatial. I'm sure we could support you in that. Obviously it is not as mature as PostGIS, but I think it is very suitable for flexible queries, especially if you plan to combine a complex domain model with spatial data, or expose a spatial element to existing domains. I found some Roadmap descriptions in the Neo4j Wiki ( http://wiki.neo4j.org/content/Neo4j_Spatial_Project_Plan), but I am not sure that these are still valid. Craig said (somewhere) that Neo4j Spatial is still alpha (I hope that this means that only the interfaces are still unstable). And I know that neo4j-spatial is an open source project where there is no Neo Technology responsibility. The project plan you found was unfortunately the original plan put down before neo4j-spatial really started, and represents the expectations for 2010. Most of these were met, and several other capabilities achieved in addition. I will edit the wiki to more accurately reflect the current status of the project. However, it is still true that it is in an alpha state. The API's are likely to change. Since last September we have viewed it as an alpha release, available for people to try out and provide feedback on. We believe it is capable of many useful tasks, and can be used for real applications. But it has not been in the 'wild' for long, and so there are probably remaining bugs and performance issues. In addition, as mentioned before, we will almost certainly change the API's a little as we receive more feedback and move the system forward. Already in 2011 there have been three new additions influencing the API: the SimplePointLayer for LBS and related capabilities, the beginnings of the REST API for inclusion in Neo4j-Server, and the Geoprocessing features. Can you drop a few words about the Spatial roadmap, its stability and planned licensing (all based on using it on a high volume web site)? I think we need Peter's opinion on the licensing. I believe it is currently the same as neo4j itself. The code comments state AGPL, and I am not sure if the recent decision to move the core to GPL is applicable to the spatial code. For the roadmap we will also update the wiki pages. Currently the efforts are to: - Improve the OSM model API (some basic API for exploring the OSM ways and nodes, already in place but needing some refinement) - Improve the REST API for spatial (we have some customers trying this out, and will make enhancements based on their feedback) - Integrate the spatial index into the new automatic indexing feature of Neo4j (some initial prototype of this is in place, and will be refined for the 1.5 release of Neo4j) - Improved Geoprocessing support, particularly on the OSM model. This is involving a GSoC project and will be presented at FOSS4G in Denver this year. See http://2011.foss4g.org/sessions/geoprocessing-neo4j-spatial-and-osm Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] More spatial questions
Hi Nolan, I think I can answer a few of your questions. Firstly, some background. The graph model of the OSM data is based largely on the XML formated OSM documents, and there you will find 'nodes', 'ways', 'relations' and 'tags' each as their own xml-tag, and as a consequence each will also have their own neo4j-node in the graph. Another point is that the geometry can be based on one or more nodes or ways, and so we always create another node for the geometry, and link it to the osm-node, way or relation that represents that geometry. What all this boils down to is that you cannot find the tags on the geometry node itself. You cannot even find the location on that node. If you want to use the graph model in a direct way, as you have been trying, you really do need to know how the OSM data is modeled. For example, for a LineString geometry, you would need to traverse from the geometry node to the way node and finally to the tags node (to get the tags). To get to the locations is even more complex. Rather than do that, I would suggest that you work with the OSM API we provided with the OSMLayer, OSMDataset and OSMGeometryEncoder classes. Then you do not need to know the graph model at all. For example, OSMDataset has a method for getting a Way object from a node, and the returned object can be queried for its nodes, geometry, etc. Currently we provide methods for returning neo4j-nodes as well as objects that make spatial sense. One minor issue here is the ambiguity inherent in the fact that both neo4j and OSM make use of the term 'node', but for different things. We have various solutions to this, sometimes replacing 'node' with 'point' and sometimes prefixing with 'osm'. The unit tests in TestsForDocs includes some tests for the OSM API. My first goal is to find the nearest OSM node to a given lat, lon. My attempts seem to be made of fail thus far, however. Here's my code: Most of the OSM dataset is converted into LineStrings, and what you really want to do is find the closest vertex of the closest LineString. We have a utility function 'findClosestEdges' in the SpatialTopologyUtils class for that. The unit tests in TestSpatialUtils, and the testSnapping() method in particular, show use of this. My thinking is that nodes should be represented as points, so I can't see why this fails. When I run this in a REPL, I do get a node back. So far so good. Next, I want to get the node's tags. So I run: The spatial search will return 'geometries', which are spatial objects. In neo4j-spatial every geometry is represented by a unique node, but it is not required that that node contain coordinates or tags. That is up to the GeometryEncoder. In the case of the OSM model, this information is elsewhere, because of the nature of the OSM graph, which is a highly interconnected network of points, most of which do not represent Point geometries, but are part of much more complex geometries (streets, regions, buildings, etc.). n.getSingleRelationship(OSMRelation.TAGS, Direction.INCOMING) The geometry node is not connected directly to the tags node. You need two steps to get there. But again, rather than figure out the graph yourself, use the API. In this case, instead of getting the geometry node from the SpatialDatabaseRecord, rather just get the properties using getPropertyNames and getProperty(String). This API works the same on all kinds of spatial data, and in the case of OSM data will return the TAGS, since those are interpreted as attributes of the geometries. n.getSingleRelationship(OSMRelationship.GEOM, Direction.INCOMING).getOtherNode(n).getPropertyKeys I see what appears to be a series of tags (oneway, name, etc.) Why are these being returned for OSMRelation.GEOM rather than OSMRelation.TAGS? These are not the tags. Now you have found the node representing an OSM 'Way'. This has a few properties on it that are relevant to the way, the name, whether the street is oneway or not, etc. Sometimes these are based on values in the tags, but they are not the tags themselves. This node is connected to the geometry node and the tags node, so you were half-way there (to the tags that is). You started at the geometry node, and stepped over to the way node, and one more step (this time with the TAGS relationship) would have got you to the tags. But again, I advise against trying to explore the OSM graph by itself. As you have already found, it is not completely trivial. What you should have done is access the attributes directly from the search results. Additionally, I see the property way_osm_id, which clearly isn't a tag. It would also seem to indicate that this query returned a way rather than a node like I'd hoped. This conclusion is further born out by the tag names. So clearly I'm not getting the search correct. But beyond that, the way being returned by this search isn't close to the lat,lon I provided. What am I missing? The lat/long values are quite a bit deeper in the graph. In the case
Re: [Neo4j] Auto Indexing for Neo4j
I am using only one relationship type in my index tree, and made traversal decisions based on properties of the tree nodes, but have considered an 'optimization' based on embedding the index keys into the relationship types, which I think is what you did. However, I am not convinced it will work well because I suspect there will be losses if the total number of relationship types gets very high. I think this is a separate issue to the total number of relationships, but might affect all traversers, since there must exist a hashmap of all relationship types. Still it is very cool what Peter says below, because if all these 'experiments' with in-graph indexes can get put behind the standard index API, then we can get much more testing of this approach, and hopefully learn what we need to make this a viable solution for wide use. On Wed, Jun 15, 2011 at 4:56 AM, Michael Hunger michael.hun...@neotechnology.com wrote: A problem with a probably dumb index in a graph that I created for an experiment was the performance of getAllRelationships on that machine (it was a very large graph with all nodes being indexed). It was a mapping from long values to nodes, my simplistic approach just chopped the long values into chunks of 3 digits and used those 3 digits as relationship-types (i.e. 1000 additional rel-types). to form a tree which pointed to the node in question at the end. Will have to investigate that further. Am 14.06.2011 um 23:43 schrieb Peter Neubauer: Craig, the autoindexing is one step in this direction. The other is to enable the Spatial and other in-graph indexes like the graph-collections (timeline etc) at all to be treated like normal index providers. When that is done (will talk to Mattias who is coming back from vacation tomorrow on that), we are in a position to think about more complex autoindex providers. Also, the possibility to treat Neo4j Spatial and other graph structures as index providers, would hook into the index framework and expose things to higher level queries like Cypher and Gremlin, e.g. combining a spatial bounding box geometry search with a graph traversal for suitable properties that are less than 2 kilometers from the nearest school, sorting the results, returning only price and lat as columns, the 3 topmost hits. START geom = (index:spatial:'BBOX(the_geom, -90, 40, -60, 45)') MATCH (geom)--(fast), (fast)-[r, :NEAR]-(school) WHERE fast.roooms4 AND school.classes4 AND r.length2return fast.pic?, fast.lon?, fast.lat? SORT BY fast.price, fast.lat^ SLICE 3 So, I think the next step is to make in-graph indexing structures plug into the index framework, and then into autoindexing :) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Jun 14, 2011 at 5:49 PM, Craig Taverner cr...@amanzi.com wrote: This is great news. Now I'm really curious about the next step, and that is allowing indexes other than lucene. For example, the RTree index in neo4j-spatial was never possible to wrap behind the normal index API, because that was designed only for properties of nodes (and relationships), but the RTree is based on something completely different (complete spatial geometries). However, the new auto-indexing feature implies that any node can be added to an index without the developer needing to know anything about the index API. Instead the index needs to know if the node is appropriate for indexing. This is suitable for both lucene and the RTree. So what I'd like to see is that when configuring auto-indexing in the first place, instead of just specifying properties to index, specify some indexer implementation that can be created and run internally. For example, perhaps you pass the classname of some class that implements some necessary interface, and then that is instantiated, passed config properties, and used to index new or modified nodes. One method I could imagine this interface having would be a listener for change events to be evaluated for whether or not the index should be activated for a node change. For the lucene property index, this method would return true if the property exists on that node. For the RTree this method would return true if the node contained the meta-data required for neo4j-spatial to recognize it as a spatial type? Alternatively just an index method that does nothing when the nodes are not to be indexed, and indexes when necessary? So, are we now closer to having this kind of support? On Tue, Jun 14, 2011 at 11:30 PM, Chris
Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
Could this also be related to the possibility that in order to determine relationship type and direction, the relationships need to be loaded from disk? If so, then having a large number of relationships on the same node would decrease performance, if the number was large enough to affect the disk io caching. If this is the case, perhaps adding a proxy node for the incoming relationships would work-around the problem? Of course then you have doubled the number of part nodes (two for each part, one part and one containers proxy). On Wed, Jun 15, 2011 at 10:27 PM, Rick Bullotta rick.bullo...@thingworx.com wrote: I would respectfully disagree that it doesn't necessarily represent production usage, since in some cases, each query/traversal will be unique and isolated to a part of a subgraph, so in some cases, a cold query may be the norm -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger Sent: Wednesday, June 15, 2011 10:25 AM To: Neo4j user discussions Subject: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships That is rather a case of warming up your caches. Determining the traversal speed from the first run is not a good benchmark as it doesn't represent production usage :) The same (warming up) is true for all kinds of benchmarks (except for startup performance benchmarks). Cheers Michael Am 15.06.2011 um 14:48 schrieb Agelos Pikoulas: I have a few Part nodes related with each via HASPART relationship/edges. (eg Part1---HASPART---Part2---HASPART---Part3 etc) . TraversalDescription works fine, following each Part's outgoing HASPART relationship. Then I add a large number (say 100.000) of Container Nodes, where each Container has a CONTAINS relation to almost *every* Part node. Hence each Part node now has a 100.000 incoming CONTAINS relationships from Container nodes, but only a few outgoing HASPART relationships to other Part nodes. Now my previous TraversalDescription run extremely slow (several seconds inside each IteratorPath.next() call) Note that I do define relationships(RT.HASPART, Direction.OUTGOING) on the TraversalDescription, but it seems its not used by neo4j as a hint. Note that on a subsequent run of the same Traversal, its very quick indeed. Is there any way to use Indexing on relationships for such a scenario, to boost things up ? Ideally, the Traversal framework could use automatic/declerative indexing on Node Relationship types and/or direction to perform such traversals quicker. Regards ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Most Efficient way to query in my use cases
Another common thing to do in this case is create a node for the purchase action. This node would be related to the purchaser (user), item (pen) and shop, and would contain data appropriate to the purchase (date/time, price, etc). Then traverse from the shop or the pen to all purchase actions that reference the other one (shop or pen). On Thu, Jun 16, 2011 at 4:48 AM, Jim Webber j...@neotechnology.com wrote: Hi Manav, I think there's a relationship missing here. Pen--SOLD_BY--shop That way it's easy to find all the pens that a shop sold, and who them sold them to. In general modelling your domain expressively does not come at an increase cost with Neo4j (caveat: you can still create write hotspots). Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Slow Traversals on Nodes with too many Relationships
I understood that on windows the memory mapped sizes needed to be included in the heap, since they are not allocated outside the heap as they are on linux/mac. So in this case he needs a larger heap (and make sure the memory mapped files are much smaller than the heap). The relevant part of the configuration settings doc says: When running Neo4j on Windows the size of the memory-mapped nioneo configurations need to be added to the heap size parameter. On Linux and Unix-systems memory mapped IO is not included in the heap size. I still think that the solution to this case is to group the different relationship types into separate sub-graphs, so that the performance of traversing HAS_ONE is not affected by the number of relationships of CONTAINS. Of course traversing the CONTAINS will still be slow without increasing the cache, as you suggest. On Thu, Jun 16, 2011 at 12:07 AM, Michael Hunger michael.hun...@neotechnology.com wrote: Agelos, sorry, didn't want to sound that way. 512M ram is not very much for larger graphs. Neo4j has to cache nodes, relationships in the heap as well as you own datastructures. The memory mapped files for the datastores are kept outside the heap. Normally with your 4G I'd suggest using about 1.5G for heap and 1.5G for the memory mapped files. http://wiki.neo4j.org/content/Configuration_Settings Do you have a small test-case available that creates your graph and runs your traversal? Then I could have a look at that and also do some profiling to determine the issues for this slowdown. The indexing doesn't help as it also has to hit caches or disk. The graph traversal is normally a very efficient operation that shouldn't experience this bad performance. Cheers Michael P.S. I just use my mail client for handling the mailing list and it works fine for me. Imho Gmail groups threads automatically. Am 15.06.2011 um 17:40 schrieb Agelos Pikoulas: Re: [Neo4j] Slow Traversals on Nodes with too many Relationships I have to respectfully agree with Rick Bullotta. I was suspecting the big-O is not linear for this case. To verify I added x4 Container nodes (400.000) and their appropriate Relationships, and it is now *unbelievably* slow : It does not take x4 more, but it takes more than 30-40 seconds for each next() Remind you 100K nodes = ~2secs for each next() !!! And only to make matters worse, the subsequent runs weren't fast either - they actually took more time than the first (1st TotalTraversalTime= 389936ms, 2nd TotalTraversalTime= 443948ms) The whole setup is running on Eclipse 3.6, with -Xmx512m on JavaVM, Windows2003 VMWare machine with 4GB, running on a fast 2nd gen SSD (OCZ Vertex 2). The neo4J data resides on this SSD. The 100.000 nodes data files were ~250MB, the 400.000 one is ~1GB. I wonder what would happen if the Container nodes were a few million (which will be my case) - it will run forever. Could you please looking into my suggestion - i.e Using a 'smart' behind the scenes Indexing on both *RelationshipType* and *Direction* that Traversals actually use to boost things up ? To another topic, how does one use this mailing list - I use it through gmail and I am utterly lost - is there a better client/UI to actually post/reply into threads ? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Auto Indexing for Neo4j
This is great news. Now I'm really curious about the next step, and that is allowing indexes other than lucene. For example, the RTree index in neo4j-spatial was never possible to wrap behind the normal index API, because that was designed only for properties of nodes (and relationships), but the RTree is based on something completely different (complete spatial geometries). However, the new auto-indexing feature implies that any node can be added to an index without the developer needing to know anything about the index API. Instead the index needs to know if the node is appropriate for indexing. This is suitable for both lucene and the RTree. So what I'd like to see is that when configuring auto-indexing in the first place, instead of just specifying properties to index, specify some indexer implementation that can be created and run internally. For example, perhaps you pass the classname of some class that implements some necessary interface, and then that is instantiated, passed config properties, and used to index new or modified nodes. One method I could imagine this interface having would be a listener for change events to be evaluated for whether or not the index should be activated for a node change. For the lucene property index, this method would return true if the property exists on that node. For the RTree this method would return true if the node contained the meta-data required for neo4j-spatial to recognize it as a spatial type? Alternatively just an index method that does nothing when the nodes are not to be indexed, and indexes when necessary? So, are we now closer to having this kind of support? On Tue, Jun 14, 2011 at 11:30 PM, Chris Gioran chris.gio...@neotechnology.com wrote: Good news everyone, A request that's often come up on the mailing list is a mechanism for automatically indexing properties of nodes and relationships. As of today's SNAPSHOT, auto-indexing is part of Neo4j which means nodes and relationships can now be indexed based on convention, requiring far less effort and code from the developer's point of view. Getting hold of an automatic index is straightforward: AutoIndexerNode nodeAutoIndexer = graphDb.index().getNodeAutoIndexer(); AutoIndexNode nodeAutoIndex = nodeAutoIndexer.getAutoIndex(); Once you've got an instance of AutoIndex, you can use it as a read-only IndexNode. The AutoIndexer interface also supports runtime changes and enabling/disabling the auto indexing functionality. To support the new features, there are new Config options you can pass to the startup configuration map in EmbeddedGraphDatabase, the most important of which are: Config.NODE_AUTO_INDEXING (defaults to false) Config.RELATIONSHIP_AUTO_INDEXING (defaults to false) If set to true (independently of each other) these properties will enable auto indexing functionality and at the successful finish() of each transaction, all newly added properties on the primitives for which auto indexing is enabled will be added to a special AutoIndex (and deleted or changed properties will be updated accordingly too). There are options for fine grained control to determine properties are indexed, default behaviors and so forth. For example, by default all properties are indexed. If you want only properties name and age for Nodes and since and until for Relationships to be auto indexed, simply set the initial configuration as follows: Config.NODE_KEYS_INDEXABLE = name, age; Config.RELATIONSHIP_KEYS_INDEXABLE=since, until; For the semantics of the auto-indexing operations, constraints and more detailed examples, see the documentation available at http://docs.neo4j.org/chunked/1.4-SNAPSHOT/auto-indexing.html We're pretty excited about this feature since we think it'll make your lives as developers much more productive in a range of use-cases. If you're comfortable with using SNAPSHOT versions of Neo4j, please try it out and let us know what you think - we'd really value your feedback. If you're happier with using packaged milestones then this feature will be available from 1.4 M05 in a couple of weeks from now. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversals versus Indexing
Think of your domain model graph as a kind of index. Traversing that should generally be faster than a generic index like lucene. Of course some things do not graph well, and you should use lucene for those. But if you can find something with a graph traversal, that is likely the way to go. Also you should think of structuring the graph to suit the queries you plan to perform. Then you will optimize the traversals. On Jun 13, 2011 11:33 AM, espeed ja...@jamesthornton.com wrote: It depends on the traversal you are running. -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Neo4j-Traversals-versus-Indexing-tp3057515p3057538.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j-spatial
Hi Saikat, Yes, your explanation was clear, but I was busy with other work and failed to repond - my bad ;-) Anyway, your idea is nice. And I can think of a few ways to model this in the graph, but at the end of the day the most important thing to decide first is what queries are you going to perform? Do you want a creative map, that while not drawn to scale, can still be asked questions like 'how far from the roller-coaster to the closest lunch venue?'. That kind of question could make use of the graph and the spatial extensions to provide an answer and show the route on the creative map, even if it is not a real to-scale map. Is that what you want to see? You can try contact me on skype also. Regards, Craig On Thu, Jun 9, 2011 at 5:35 AM, Saikat Kanjilal sxk1...@hotmail.com wrote: Hi Craig,Following up on this thread, was this explanation clear? If so I'd like to talk more details.Regards From: sxk1...@hotmail.com To: user@lists.neo4j.org Subject: RE: [Neo4j] neo4j-spatial Date: Sun, 5 Jun 2011 20:15:27 -0700 Hey Craig,Thanks for responding, so to be clear a theme park can have its own map created by the graphic artists that work at the theme park company, this map is sometimes 2D or sometimes a 3D map that really has no notion of lat long coordinates or GPS. What I am proposing is that we have the ability to inject GPS coordinates into this creative map through some mechanism that understands what the GPS coordinates of each point in this creative map are. So thats where the google map comes in, the google or bing map would potentially have lat long coordinates of every point in a theme park, so now the challenge is how do we transfer that knowledge inside this 2D or 3D creative map so that we can run neo4j traversal algorithms inside a map that has been injected with GPS data. A theme park is just the beginning, imagine having the power to inject this information into any 2D or 3D map, that would be pretty amazing.In essence I am doing this so that the creative map itself can use neo4j and be highly interactive and meaningful. Let me know if that's still unclear and if so lets talk on skype. Regards Date: Mon, 6 Jun 2011 01:13:08 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] neo4j-spatial Hi Saikat, This sounds worth discussing further. I think I need to hear more about your use case. I do not know what the term 'creative map' means, and what traversals you are planning to do? When you talk about 'plotting points', do you mean you have a GPS and are moving inside a real theme park and want to see this inside google maps? Or are you just drawing a path on an interactive GIS? I think once I have some more understanding of what your use case is, what problem you are trying to solve, I am sure I will be able to give advice on how best to approach it, if it relates to anything else we are doing, or whether this is something you would need to put some coding time into :-) Regards, Craig On Sun, Jun 5, 2011 at 8:26 PM, Saikat Kanjilal sxk1...@hotmail.com wrote: Craig et al,I have an interesting usecase that I've been thinking about and I was wondering if it would make a good candidate for inclusion inside neo4j-spatial, I've read through the wiki ( http://wiki.neo4j.org/content/Collaboration_on_Spatial_Projects) and was interested in using neo4j-spatial to take any creative 2D Map and geo-enabling it. To explain in more detail lets say you are at a certain latitude and longitude in a theme park inside a google map (or a bing map), now you want to have the ability to reference that same latitude and longitude inside a 2d or a 3d creative map of that theme park and then be able to plot these points and enable traversal algorithms inside the creative map. I was wondering if you guys are thinking about this usecase, if not I'd love to work on and discuss this in more detail to see whether this fits into the neo4j-spatial roadmap. Thoughts? ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
OK. I understand much better what you want now. Your person nodes are not geographic objects, they are persons that can be at many positions and indeed move around. However, the 'path' that they take is a geographic object and can be placed on the map and analysed geographically. So the question I have is how do you store the path the person takes? Is this a bunch of position nodes connected back to that person? Or perhaps a chain of position-(next)-position-(next)-position, etc? However you have stored this in the graph, you can express this as a geographic object by implementing the GeometryEncoder interface. See, for example, the 6 lines of code it takes to traverse a chain of NEXT locations and produce a LineString geometry in the SimpleGraphEncoder at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/encoders/SimpleGraphEncoder.java#L82If you do this, you can create a layer that uses your own geometry encoder (or the SimpleGraphEncoder I referenced above, if you use the same graph structure) and your own domain model will be expressed as LineString geometries and you can perform spatial operations on them. Alternatively, if your data is more static in nature, and you are analysing only what the person did in the past, and the graph will therefor not change, perhaps you do not care to store the locations in the graph, and you can just import them as a LineString directly into a standard layer. Whatever route you take, the final action you want to perform is to find points near the LineString (path the person took). I do not think the bounding box is the right approach for that either. You need to try, for example, the method findClosestEdges in the utilities class at https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115 https://github.com/neo4j/neo4j-spatial/blob/master/src/main/java/org/neo4j/gis/spatial/SpatialTopologyUtils.java#L115This method can find the part of the persons path that it closest to the point of interest. There also also many other geographic operations you might be interested in trying, once you have a better feel for the types of queries you want to ask. Regards, Craig On Wed, Jun 8, 2011 at 2:17 AM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks for the detailed response! Here is what I'm trying to do and I'm still not sure how to accomplish it: 1. I have a node which is a person 2. I have geo data as that person moves around the world 3. I use the geodata to create a bounding box of where that person has been today 4. I want to say, was this person A near location X today? 5. I do this by seeing if location X is in A's bounding box. From looking at what you suggest doing, it's not clear how I assign the node person A to a layer? Is it that the bounding box is now in the layer and not in the node? The issue then becomes, how od I associate the two as the RTree relationship seems to establish itself on the bounding box between the node and the layer. Many thanks for your patience as I learn this challenging material. On Tue, Jun 7, 2011 at 4:13 PM, Craig Taverner cr...@amanzi.com wrote: I think you need to differentiate the bounding boxes of the data in the layer (stored in the database), and the bounding box of the search query. The search query is not stored in the database, and will not be seen as a node or nodes in the database. So if you want to search for data within some bounding box or polygon, then express that in the search query, and you do not need to care about how your nodes are stored in the database. So when you say you want to make a larger bounding box, I assume you are talking about the query itself. The REST API has the method findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you can set those to whatever you want for your query. The REST API also exposes the CQL query language supported by GeoTools. This allows you to perform SQL-like queries on geometries and feature attributes. For example, you can search for all objects within a specific polygon (not just a rectangular bounding box), as well as conforming to certain attributes. See http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.htmlfor some examples of CQL. However, our current CQL support is not fully integrated with the RTree index. This means that the CQL itself will not benefit from the index, but be a raw search. You can, however, still get the benefit of the index by passing in the bounding box separately. So, for example, you want to search for data in a polygon. Make the polygon object, get it's bounding box and also the CQL query string. Then make a 'dynamic layer' using the CQL (which is a bit like making a prepared statement
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
Hi, The bounding boxes are used by the RTree index, which is a typical way to index spatial data. For Point data, the lat/long and the bounding box are the same thing, but for other shapes (streets/LineString and Polygons), the bounding box is quite different to the actual geometry (which is not just a single lat/long, but a set of connected points forming a complex shape). The RTree does not differentiate between points and other geometries, because it cares only about the bounding box, and therefor we provide that even for something as simple as a Point. Does that answer the question? Regards, Craig On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote: Greetings! Perhaps someone using neo4j-spatial can answer this seemingly simple question. Nodes classified into layers have both lat/lon properties and bounding boxes, the bounding box seems to be required to establish the relationship between node and layer, however the node is not found if the lat/lon does not match the query. Can someone explain the relationship between these two properties on a node? Many thanks! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j spatial bounding box vs. lat/lon
I think you need to differentiate the bounding boxes of the data in the layer (stored in the database), and the bounding box of the search query. The search query is not stored in the database, and will not be seen as a node or nodes in the database. So if you want to search for data within some bounding box or polygon, then express that in the search query, and you do not need to care about how your nodes are stored in the database. So when you say you want to make a larger bounding box, I assume you are talking about the query itself. The REST API has the method findGeometriesInLayer, which takes minx, maxx, miny, maxy parameters and you can set those to whatever you want for your query. The REST API also exposes the CQL query language supported by GeoTools. This allows you to perform SQL-like queries on geometries and feature attributes. For example, you can search for all objects within a specific polygon (not just a rectangular bounding box), as well as conforming to certain attributes. See http://docs.geoserver.org/latest/en/user/tutorials/cql/cql_tutorial.html for some examples of CQL. However, our current CQL support is not fully integrated with the RTree index. This means that the CQL itself will not benefit from the index, but be a raw search. You can, however, still get the benefit of the index by passing in the bounding box separately. So, for example, you want to search for data in a polygon. Make the polygon object, get it's bounding box and also the CQL query string. Then make a 'dynamic layer' using the CQL (which is a bit like making a prepared statement). Then perform the same 'findGeometriesInLayer' method mentioned above, using the bounding box and the dynamic layer (containing the CQL). This has the effect of using the RTree index for a first approximate search, followed by pure CQL for the final mile. See examples of this in action in the Unit tests in the source code. https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/ServerPluginTest.java#L109 has examples of CQL queries on the REST API. On Tue, Jun 7, 2011 at 5:48 PM, Boris Kizelshteyn bo...@popcha.com wrote: Thanks! So it seems you are saying that the bounding box represents a single point and is the same as the lat/lat lon? What if I make the bounding box bigger? What I am trying to do is geo queries against a bounding box made of a set of points, rather than individual points. So the query is, find the nodes where the given point falls inside their bounding boxes. Can I do this with REST? Thanks! On Tue, Jun 7, 2011 at 11:34 AM, Craig Taverner cr...@amanzi.com wrote: Hi, The bounding boxes are used by the RTree index, which is a typical way to index spatial data. For Point data, the lat/long and the bounding box are the same thing, but for other shapes (streets/LineString and Polygons), the bounding box is quite different to the actual geometry (which is not just a single lat/long, but a set of connected points forming a complex shape). The RTree does not differentiate between points and other geometries, because it cares only about the bounding box, and therefor we provide that even for something as simple as a Point. Does that answer the question? Regards, Craig On Tue, Jun 7, 2011 at 4:57 PM, Boris Kizelshteyn bo...@popcha.com wrote: Greetings! Perhaps someone using neo4j-spatial can answer this seemingly simple question. Nodes classified into layers have both lat/lon properties and bounding boxes, the bounding box seems to be required to establish the relationship between node and layer, however the node is not found if the lat/lon does not match the query. Can someone explain the relationship between these two properties on a node? Many thanks! ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC
Done. Although now we have 20 lines of comments for 1 line of method code. Previously we had 4 lines of comments for one line of code. Whew! On Tue, Jun 7, 2011 at 11:02 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Very cool. Maybe you could just doc the parameters more than pointing to the Oracle reference, so one can see it directly in the JavaDoc? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Thu, Jun 2, 2011 at 2:13 PM, Craig Taverner cr...@amanzi.com wrote: Hi, Recently someone asked a question on StackOverflow, if Neo4j Spatial was capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT specifically. Since this is related to the ongoing GSoC projects for Neo4j Spatial, I thought I would do a quick investigation. What I found was that the requested capabilities are available in JTS (which we include in Neo4j Spatial), but with very different names. The code to achieve this in JTS is 'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have wrapped these in the SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is accessible together with some other spatial topology functions, and also looks more like the Oracle function. I pushed this to github, and think it can be included as a prototype for the discussions for the GSoC on Geoprocessing. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] GSoC 2011 Neo4j Geoprocessing | Weekly Report #2
I suggest you code review them first. Especially since there are API changes. On Tue, Jun 7, 2011 at 10:11 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Very nice Andreas! You consider it safe to pull these changes into the main repo? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Jun 5, 2011 at 1:39 PM, Andreas Wilhelm a...@kabelbw.de wrote: Hi, This week I implemented update and search capability for spatial functions and following spatial functions with JUnit tests: ST_AsText, ST_AsKML, ST_AsGeoJSON, ST_AsBinary and ST_Reverse. Best Regards Andreas ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [SoC] Re: GSoC 2011 Weekly report - OSM data mining and editing capabilities in uDig and Geotools
Hi Mirco, Sounds like progress. Some suggestions: - I do not think you need to change the code for neo4j and udig, but only for neo4j-spatial and udig-community/neo4j. It is OK to make clones of those so you have the code for review, but they are quite core, and you should not need to actually change them. - Focus on neo4j-spatial and udig-community/neo4j, which are the two projects you will certainly make changes to. All uDig GUI changes can be made in udig-community/neo4j. - You might even want to make a new udig plugin in a new git project, perhaps udig-community/osm, for the OSM editor work. The neo4j plugin would provide the communication layer for neo4j and any neo4j data sources, while the OSM plugin would provide OSM specific features, including the additional views and editors required to support a complete 'OSM Editor' capability. Regards, Craig On Sun, Jun 5, 2011 at 1:51 AM, Mirco Franzago mircofranz...@gmail.comwrote: Weekly report #2 ==What I did== - The main work was to set-up the whole devel enviroment: eclipse + udig + neo4j. - I forked the repository on github for my code: [0], [1] and [2] are respectively the repositories for udig, neo4j and neo4j-spatial. - The target was to have eclipse with the udig sdk took from github, just as neo4j, to be able to commit the udig code and the neo4j code from the same envoroment. - I set-up the apache maven tool and the e-git plugin to be able to use them directly from eclipse. - After these steps and some fighting against the jars to import it was possible to execute udig with the neo4j plugins and to test the main functionalities. - I started the code analysis to understand where put my hands next week :-) ==Next week plan== - Fix some last problems for a new git user with the commit command. - Finally start the real coding after the initially head-cracking problems. [0] https://github.com/mircofranzago/udig-platform [1] https://github.com/mircofranzago/neo4j [2] https://github.com/mircofranzago/neo4j-spatial 2011/5/31 Mirco Franzago mircofranz...@gmail.com Hi all, I am Mirco Franzago and I started to work to my google summer of code 2011 project. I weekly will update this thread to let the community know about the work done and the work that will do. Last week I could not to do much cause I was very busy for my last exam before summer. Now I'm ready to start for this new job. ___ SoC mailing list s...@lists.osgeo.org http://lists.osgeo.org/mailman/listinfo/soc ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Sample Linear Referencing Functions in Neo4j Spatial and GSoC
Hi, Recently someone asked a question on StackOverflow, if Neo4j Spatial was capable of one of the Oracle geoprocessing funtions, SDO_LRS.LOCATE_PT specifically. Since this is related to the ongoing GSoC projects for Neo4j Spatial, I thought I would do a quick investigation. What I found was that the requested capabilities are available in JTS (which we include in Neo4j Spatial), but with very different names. The code to achieve this in JTS is 'new LengthIndexedLine(geometry).extractPoint(measure,offset)'. I have wrapped these in the SpatialTopologyUtils.locatePoint(geometry,measure,offset), so that it is accessible together with some other spatial topology functions, and also looks more like the Oracle function. I pushed this to github, and think it can be included as a prototype for the discussions for the GSoC on Geoprocessing. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] path finding using OSM ways
Hi Bryce, Nice to see you back. The OSM data model in Neo4j-Spatial, created by the OSMImporter, is designed to mimic the complete contents of the XML files provided for OSM. As it is, this is not ideal for routing because it traces the complete set of nodes for the ways, while for routing you really want a graph that connects each waypoint by a single relationship. So, if I were to perform routing on top of the OSM model, I would actually build an overlap graph that just connects the waypoints. The current model has a vertex called a 'way', but that is not a way-point, because it represents the entire way (eg. a street). We would need to do the following: - Identify ways that are streets (as opposed to non-routing types like regions, buildings, lakes, etc.) - Identify the points that are intersections (way-points) - Create a way-point node for these - Add relationships between way points if they are connected by streets in the OSM model - Weight the relationships by the length of the streets - Then apply the A* algorithm (which I have no experience with myself, but others in neo4j certainly do) I think everything but the last part would be very easy to add to the OSMImporter itself, so that the routing graph exists in any OSM model. Today it does not exist, and routing would be more difficult and expensive (since you would have to traverse a much more complex graph, unnecessarily). Regards, Craig On Tue, May 31, 2011 at 4:31 AM, bryce hendrix brycehend...@gmail.comwrote: I am finally getting back to experimenting with Neo4j. Because it has been a while since I last looked at it, I've forgotten just about everything. I want to start with something simple, is there any sample code which does A* path finding over OSM ways? Thanks, Bryce ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Embedded with webadmin
While HA is one option, with two processes 'sharing' a database, one being the server and the other the embedded app, there is another option, and that is to integrate the two apps. If your app is a web-app and also needs to exist in something like jetty or winstone, perhaps you could run both the server and your app together in the same process? One obvious way of doing this is to write your app as a server extension within the neo4j-server extensions API. I suspect there are other ways to do this where your app is in control and simply accesses (and starts) the relevant code from neo4j-server, but I don't know how to do that. Could be interesting to find out. On Tue, May 24, 2011 at 11:39 PM, Adriano Henrique de Almeida adrianoalmei...@gmail.com wrote: Yep, the neo4jserver is just a rest api over neo4j database, so it's still stored in at the disk. So, all you need to do, is to point your java application to the neo4j db directory. Remember, that you'll be unable to start both you app and the neo4j server at the same time, at the same database. For this situation, you'll need Neo4j HA. Regards 2011/5/24 Chris Baranowski pharcos...@gmail.com Hi all, I searched this mailing list some but couldn't find a definitive answer: is it possible to use the web admin with an embedded neo4j database? I'd like to run embedded in my project and also be able to administrate online. Thanks! Chris ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Adriano Almeida Caelum | Ensino e Inovação www.caelum.com.br ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] finding nodes between two nodes
If you remove the depth=1, and specify the direction, you can get to the excluded dishes in one traversal: relationships = [{type= answered, direction = outgoing}, {type= excludes, direction = outgoing}] That will simplify the code a lot. It does not get to the safe dishes in one traversal, but at least it moved three down to two. On Wed, May 18, 2011 at 3:11 PM, noppanit noppani...@gmail.com wrote: Customer Menu | | [rel:customer] | | | Joy / \ | [rel:answered] /\ [rel:dish] Nut Allergy / Pasta | / [rel:excludes] | /[rel:dish] | / Nut Salad---| So basically this is what I wanted to do and this is the shorter version of my graph. I want to get all the dishes that Joy is not allergic to. So, at the end I want to get Pasta which Joy is not allergic to, because Joy answered the question that she is allergic to Nut, and Nut salad contains Nut. The solution that I'm using right now is. Traverse to get all the dishes from the menu node and store it to one array. And the second traverse I get all the dishes that Joy is allergic to by traversing all the answered relationship from Joy and traversing again by excludes relationship to get all the dishes that Joy is allergic to. Then I differentiate the two arrays to get all the dishes that Joy can eat. I was wondering that could I get all the dishes that Joy could eat in just one traversal? And this is the code that I use to do that. I'm using Ruby and Neography. def get_excluded_dishes_for_customer(customer_name) customer = neo.get_index('customersIndex', 'name', customer_name) answered_dishes = neo.traverse(customer, nodes, {relationships = [{type= answered, direction = all}], depth = 1}) @array_of_answered_dishes = prepares_data(answered_dishes) @array_of_excluded_dishes = Array.new @array_of_answered_dishes.each do |answered_dish| a_dish = neo.get_index('fredsIndex', 'name', answered_dish[:text]) excluded_dishes = neo.traverse(a_dish, nodes, {relationships = [{type= excludes, direction = all}], depth = 1}) prepared_excluded_dishes = prepares_data(excluded_dishes) prepared_excluded_dishes.each do |text| @array_of_excluded_dishes text[:text] end end #Onle unique dishes @array_of_excluded_dishes = @array_of_excluded_dishes.uniq @all_dishes = prepares_data(get_all_dishes) @only_dish_names = Array.new @all_dishes.each do |text| @only_dish_names text[:text] end @array_of_excluded_dishes = @only_dish_names - @array_of_excluded_dishes return @array_of_excluded_dishes end Thank you very much. -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Neo4j-finding-nodes-between-two-nodes-tp2938387p2956858.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Color suggestions for the Self-Relationship bike shed
What about a system config enabling/disabling loops? Then we could have option 1, but for people that never loops, they can still get the extra loop check by setting the system config option. On Tue, May 17, 2011 at 2:01 AM, Stephen Roos sr...@careerarcgroup.comwrote: We are not going to use loops, but would still vote for #1. Checking against loops seems more like a business logic responsibility that Neo4j clients should be responsible for. -Original Message- From: Tobias Ivarsson [mailto:tobias.ivars...@neotechnology.com] Sent: Monday, May 16, 2011 7:02 AM To: Neo user discussions Subject: Re: [Neo4j] Color suggestions for the Self-Relationship bike shed Does anyone NOT planning to use loops have an opinion in the matter? That would be very valuable input. Cheers, -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Timeline index
Very good points. But I must admit that there is a demand for automatic indexing. I personally am not using it, but I would like prepared indexes, indexes that can be configured up front and then just add the node. I see your point about this implying more schema (in the index preparation), but I do not see that as avoidable. I think (or hope) that for automatic indexes, the criteria for how a node qualifies for indexing would be defined by the developer, hopefully with code, so it can be very general and flexible. For example, I guess that whenever a node is added to the graph, an event is triggered to pass the node to any listeners that look for patterns to match. For performance I guess there should be some simple patterns like the existence of some property to index, but it would be good if the user can define the code to be called, so more complex cases can be considered, like exploring the local sub-graph and indexing based on some more complex criteria. Certainly the user will then have the power to hurt performance, but that is currently the case anyway :-) On Mon, May 9, 2011 at 8:07 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Automatic indexes could be a very nice feature, though personally I would very much like to maintain the ability to manually index nodes and relationships. There are situations where I store a different value in a property than I store in the index (string properties containing html tags, but indexes that store those same values with the html tags stripped). There are also situations where the indexed node is not the node that actually contains the property being indexed (eg. in quad-store layout, a value node contains the property, but the related node is used in the index). I can also conceive of indexes where there is not even a stored property value involved. Having an automatic index would certainly make things easier in some scenarios, but it's not easy to create an automatic indexing mechanism that works for all possible use cases. I am also a little bit concerned about such a feature, because it would result in schema-creep. One of the most powerful features I find in Neo4J is how storage and schema are completely independent. In fact the store can be used without any schema at all, while at the same time the store can be used to persist a schema if that is needed. One of the things I disliked about table based databases is the mixing of storage and schema. It is impossible to define an entity without defining a table, which immediately creates a schema entity. Having strict separation of storage and schema is one of the reasons NOSQL databases are so flexible. Such separation makes it possible to invent different types of schemata for different use cases. When I still used relational databases, I always ended up replicating the schema facility of the underlying database to add more meta information to the database. Being able to roll my own schema facility is therefore one of the key features that made Neo4J such an attractive option. If more schema facilities would eventually creep into the kernel, those advantages would slowly dissipate. Date: Mon, 9 May 2011 18:34:10 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Timeline index +10 for both if Neils responses. I think both external and in-graph indexes should be supported. The last time I talked to Mattias about this it sounded like the only really clean option for integrating them behind one API would be once automatic indexes are supported, because at that point indexes get configured up-front (like the BTree and RTree) and then simply used (behind the scenes in automated indexes). I'm hoping automatic indexes are planned for 1.4, then all of this can come together :-) On Mon, May 9, 2011 at 3:14 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Rick, I am looking forward to the results of your investigation. I see a need for both external search mechanisms (Lucene, and possible Solr), as well as in-graph search mechanisms based on constrained traversals (eg. Timeline index based on a Btree and the Rtree index used in neo4j-spatial). Any progress in either direction is most welcome. From: rick.bullo...@thingworx.com To: matt...@neotechnology.com; user@lists.neo4j.org Date: Mon, 9 May 2011 03:57:13 -0700 Subject: Re: [Neo4j] Timeline index Niels/Mattias: we are also exploring a Solr implementation for the index framework. There are some potential benefits using Solr in a large graph/HA/distributed scenario that we are investigating. The tough part is the distributed transactioning, though. - Reply message - From: Mattias Persson matt...@neotechnology.com Date: Mon, May 9, 2011 6:14 am Subject: [Neo4j] Timeline index To: Neo4j user discussions user@lists.neo4j.org 2011/4/12 Niels Hoogeveen
Re: [Neo4j] Timeline index
I'm confident that given the history of neo4j, there will be no forcing of a schema :-) And I'm thinking of previous developments that added convenience and value, like jo4neo, neo4j.rb, even the meta-model. Useful, but no-one was ever forced or even pushed to use them. I hope the new automatic indexing will be likewise a convenient alternative to consider. On Mon, May 9, 2011 at 10:17 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: Mattias/Craig, Of course I don't want to deny people the opportunity to have easy indexing features, as long as it remains optional and doesn't lead to schema-creep into the Neo4j kernel. Having configurable event handlers that allow for automatic indexing, while maintaining the possibility to manually maintain indices sounds like a reasonable solution. Over the last year I have dedicated many hours to create my own schema driven CMS in Neo4J, which makes me vigilant to make sure the Neo4j kernel remains as schema-less as possible. see also: http://lists.neo4j.org/pipermail/user/2011-May/008431.html Adding schema/type/class information to Neo4j is likely to be much in demand the bigger applications grow, and I applaud all developments in those directions, as long as they remain optional. The schema needs for my application may differ very much from the schema needs in other applications, making it important not to add too many assumptions in the neo4j kernel. Having property keys and relationship labels is, as far as I am concerned the right dose of schema at the kernel level. Date: Mon, 9 May 2011 20:50:56 +0200 From: matt...@neotechnology.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Timeline index 2011/5/9 Niels Hoogeveen pd_aficion...@hotmail.com Automatic indexes could be a very nice feature, though personally I would very much like to maintain the ability to manually index nodes and relationships. There are situations where I store a different value in a property than I store in the index (string properties containing html tags, but indexes that store those same values with the html tags stripped). There are also situations where the indexed node is not the node that actually contains the property being indexed (eg. in quad-store layout, a value node contains the property, but the related node is used in the index). I can also conceive of indexes where there is not even a stored property value involved. Having an automatic index would certainly make things easier in some scenarios, but it's not easy to create an automatic indexing mechanism that works for all possible use cases. I am also a little bit concerned about such a feature, because it would result in schema-creep. One of the most powerful features I find in Neo4J is how storage and schema are completely independent. In fact the store can be used without any schema at all, while at the same time the store can be used to persist a schema if that is needed. One of the things I disliked about table based databases is the mixing of storage and schema. It is impossible to define an entity without defining a table, which immediately creates a schema entity. Having strict separation of storage and schema is one of the reasons NOSQL databases are so flexible. Such separation makes it possible to invent different types of schemata for different use cases. When I still used relational databases, I always ended up replicating the schema facility of the underlying database to add more meta information to the database. Being able to roll my own schema facility is therefore one of the key features that made Neo4J such an attractive option. If more schema facilities would eventually creep into the kernel, those advantages would slowly dissipate. These issues with automatic indexing are exactly those that I struggle with when I try to get my head around automatic indexing. At its core I don't like it, because it takes away control, but for 80% of the use cases I think it'd be useful. I don't think that neo4j will ever be strict schematic in any way, although some inferred types could possibly be implemented in some way, via TransactionEventHandlers. A couple of months ago I played around with auto indexing as a lab project and ended up with the exact same solution that Craig just replied with. So I'd say that or the middle way of preconfiguring indexes up front covers would pretty much make most people happy IMHO. Date: Mon, 9 May 2011 18:34:10 +0200 From: cr...@amanzi.com To: user@lists.neo4j.org Subject: Re: [Neo4j] Timeline index +10 for both if Neils responses. I think both external and in-graph indexes should be supported. The last time I talked to Mattias about this it sounded like the only really clean option for integrating them behind one API would be once automatic indexes are
Re: [Neo4j] First-class type property on relationships but not nodes; why?
Another view of things would be to say that ideally there should be no first class type on either relationships or nodes, since that is a domain specific concept (as Neils says he wants two types, but Rick wants one, and some object models type nodes by relating them to a separate node representing a class). Then the addition of a type to a relationship is, in my opinion, a performance optimization for many graph algorithms since the traverser will perform well if it has 'first class' access to this information, instead of hitting the property store. I guess this is my take on Tobias point that the type is a navigational feature. Now I wonder if the traverser, and many known graph algorithms, would be possible to make faster or easier to code if the nodes also had first class types? I don't know the answer to this, but assume that if it did really help, it would have been done already ;-) On Thu, May 5, 2011 at 3:52 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: The meta model component (though in need of some attention), already allows the typing of a node. An important difference between the typing in the meta model component and the suggestion made in this thread is the fact that a node according to the meta model can have more than one type, while the RelationshipType in kernel only allows one type per relationship. For my modeling need, the ability to assign more than one type per node is essential. Adding a singular type at kernel level would only make things more confusing. I would go further than what Tobias says and would say RelationshipType is nothing but a name, just like various properties have names. Types would require much more information, like cardinality, source/target constraints etc. Those are all part of the meta model where they belong. From: tobias.ivars...@neotechnology.com Date: Thu, 5 May 2011 15:33:04 +0200 To: user@lists.neo4j.org Subject: Re: [Neo4j] First-class type property on relationships but not nodes; why? The RelationshipType isn't a type. It is a navigational feature. I've slapped this link around for a few years now, every time this question has been brought up: http://lists.neo4j.org/pipermail/user/2008-October/000848.html The fact that RelationshipType is a navigational feature and not a type means that there is in fact already a corresponding thing for nodes: the Indexes. But I agree that there would be things we could gain by adding a Type concept to Nodes. Such as for example better automatic indexing. But I don't know what it would look like. And I want it to be clear that such a feature is very different from what RelationshipType is today. Cheers, Tobias On Thu, May 5, 2011 at 10:29 AM, Aseem Kishore aseem.kish...@gmail.com wrote: I've found it interesting that Neo4j has a mandatory type property on relationships, but not nodes. Just curious, what's the reasoning behind the design having this distinction? If you say you need to know what type of relationship these two nodes have, I would reply, don't you also need to know what type of nodes they are, as well? Similarly, if you say because there can be many different types of relationships, I would reply, there can also be many different types of nodes, and in both cases, there doesn't need to be. A perfect example is in the documentation/tutorial: movies and actors. Just the fact that we talk about the nodes in the database as movies and actors -- wouldn't it be helpful for the database to support that categorization first-class? To be precise, it's easy for us to add a type property to nodes ourselves (we do in our usage), but it's not a first-class property like relationships, where queries and traversals can easily and naturally specify the type or types they expect. Thanks! Aseem ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Examples of multiple indices in use?
This is how we use it, for performance, since some data will be much more dense than other data, we don't want the index lookup of the sparse data to be impacted by the dense data, we make separate indexes. On Thu, May 5, 2011 at 3:47 PM, Peter Hunsberger peter.hunsber...@gmail.com wrote: On Thu, May 5, 2011 at 4:16 AM, Mattias Persson matt...@neotechnology.comwrote: 2011/5/5 Aseem Kishore aseem.kish...@gmail.com Interesting. That's assuming a person and an organization can share the same name. Maybe an edge case in this example, but I can understand. Thanks. Hmm, no not share the same name, but have the name property in common... would you really like to ask an index question for a name and get back both persons and organisations mixed in the result? You may, but in many cases you wouldn't... or? The sepazration of indexes can also give you better performance. For example, consider the case that you have 1000 organizations and 200 people. You really don't want to have to search the index for all 200 people just to find 1 organization ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Lucene/Neo Indexing Question
Thinking back you your original domain description, cars with colors, surely you have more properties than just colors to index? If you have two or more properties, then you use combinations of properties for the first level of the index tree, which provides your logical partitioning of supernodes in a domain specific way. For example, considering having the four properties color, manufacturer, model, year. The first level of index nodes would be the set of unique combinations of all possible properties (all existing combinations, actually). This set is much larger than the set of colors. So red will occur many times. As a result you dramatically reduce node contention, and the number of relationships per node is much less. Then if you want to perform the query for all red cars, actually your traverser needs to be only slightly more complex, basically 'find all cars with color red and any value of the other properties'. This is the design of the 'amanzi-index' I started on github in December (but did not complete). It was focusing on doing queries on multiple properties at the same time, but does effectively cover your case of reducing node contention, if you can add more properties to the index. It also has the concept of a mapper from the domain specific property to the index key, which was designed to reduce the number of index nodes, but in your case you could also use it to increase the number of index nodes, using some of the ideas by Jim and Michael. Jim suggested that instead or 'red' always mapping to the same node, it could map to a set of different nodes (randomly selected, or round robin). Michael discussed a distributed hash-code, which I do not fully understand, but it does sound relevant :-) So, in short, using the design of the amanzi-index you could help this problem in two ways: - index together with other properties to get a domain-specific partitioning of the 'supernodes' - Add a mapper between the color and the index key to get partitioning of the supernodes On Mon, May 2, 2011 at 1:09 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: Hi, Michael. The nature of the domain model really doesn't lend itself to any logical partioning of supernodes, so it would indeed have to be something very arbitary/random. For now, I think we will have to either deal with the performance issues or switch to using Lucene for the indexing, but we can't do that yet until we have the ability to query the list of terms for a given key (which is a necessary function in our domain model). We could perhaps keep a list of terms as nodes *and* index them, but that seems redundant. Ultimately, I think the solution is to hide the complexity via the indexing framework and to offer a variety of in-graph indexing models that address specific types of domain requirements. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Michael Hunger [michael.hun...@neotechnology.com] Sent: Monday, May 02, 2011 3:49 AM To: Neo4j user discussions Subject: Re: [Neo4j] Lucene/Neo Indexing Question Perhaps then it is sensible to introduce a second layer of nodes, so that you split down your supernodes and distribute the write contention? Would be interesting if putting a round robin on that second level of color nodes would be enough to spread lock contention? This is what peter talks about in his activity stream update scenario. And in general perhaps a step to a more performant in-graph index. When thinking about in-graph indexes I thought it might perhaps be interesting to re-use the HashMap approach of declaring x (2^n) bucket-nodes then having from the index-root node relationships with the (re-distributed) hashcode (x-1) relationship-types to the bucket nodes and below the bucket node rels with the concrete value as an relationship attribute to the concrete nodes. I think this will be addressed even better with Craig's indexes or the Collection abstractions that Andreas Kollegger is working on. Cheers Michael Am 02.05.2011 um 12:16 schrieb Rick Bullotta: Hi, Niels. That's what we're doing now, but it has performance issues with large #'s of relationships when cars are constantly being added, since the color nodes become synchronization bottlenecks for updates. Rick From: user-boun...@lists.neo4j.org [user-boun...@lists.neo4j.org] On Behalf Of Niels Hoogeveen [pd_aficion...@hotmail.com] Sent: Sunday, May 01, 2011 9:41 AM To: user@lists.neo4j.org Subject: Re: [Neo4j] Lucene/Neo Indexing Question One option would be to create a unique value node for each distinct color and create a relationship from car to that value node. The value nodes can be grouped together with relationships to some reference node. This gives the opportunity of finding all distinct colors, and it allows you to find all cars with that
[Neo4j] Geoprocessing with Neo4j Spatial and OSM
Hi all, I have applied to FOSS4G to talk about Geoprocessing with Neo4j Spatial and OSM. This talk will include the new work we've done on the open street map model. In addition, we got two GSoC students this year, on related projects OSM Editor and Geoprocessing with OSM, and so they are likely to contribute some interesting new content as well. If you are interested in graph databases in GIS, OSM or geoprocessing, consider voting for my talk at http://community-review.foss4g.org/. I have included the abstract of the talk below. Regards, Craig - What better way to perform geoprocessing than on a graph! And what better dataset to play with than Open Street Map! Since we presented Neo4j Spatial at FOSS4G last year, our support for geoprocessing functions and for modeling, editing and visualization of OSM data has improved considerably. We will discuss the advantages of using a graph database for geographic data and geoprocessing, and we will demonstrate this using the amazing Open Street Map data model. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] New blog post on non-graph stores for graph-y things
On foreign key I think it was a subconscious choice to avoid it, since it has very strong semantics in other data models. I wanted to try to convey the concept of pointers without muddying that with the stricter semantics of foreign keys and referential integrity. Perhaps I'm over-optimistic, but I would like to find some common terminology we could use when describing the differences between different databases types. It really helps people understand a new database if you can compare and contrast the finer details and subtle differences. I have found using the term 'foreign key' effective precisely because it brings to mind the rdbms approach, and helps the user see a mapping between modeling in rdbms and modeling in graphs. But I agree that 'foreign key' brings other aspects that may not be appropriate, and so a more general term would be better. You say 'pointer', but that to my mind is an aspect of a foreign key and a relationship/edge. Perhaps there is no single magic word, and we have to pick and choose to suite the circumstances ;-) ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] New blog post on non-graph stores for graph-y things
Hi Jim, As always, I enjoyed reading your blog. It was well written and made the point (including even a plug in the last line ;-) While Aseem's observations are valid, I think you handled it correctly, with the product recall example being only a relatively small win for graph databases, and moving to to the bigger wins with deeper traversals. One thing I realized while reading it was that, although you did not emphasize it, the example of the Document store was actually an example of the use of foreign keys, and is applicable to all non-graph databases, including relational databases. I wonder, is the use of the term 'foreign key' applicable to all these cases? I think so, but have not found the term used much (and not in the blog). I think of a foreign key as being the reference from one object (or table, or kv entry, or document) to another. So your documents containing 'friend:other', where, to my mind, using foreign keys. I feel the distinctive difference between a foreign key and a true graph is the need for an index to provide performance in the join. Graphs have two advantages over foreign keys, one is that the relationships can be traversed in both directions, removing the need for the complementary foreign key Aseem describes, and another is that the performance of a local graph traversal is so high (implicit local index), that no index is required to traverse. I think both these points are described in your blog entry, although in different terms. Regards, Craig On Thu, Apr 21, 2011 at 7:19 PM, Jim Webber j...@neotechnology.com wrote: Hi guys, A while ago we were discussing using non-graph native backend for graph operations. I've finally gotten around to writing up my thoughts on the thread here: http://jim.webber.name/2011/04/21/e2f48ace-7dba-4709-8600-f29da3491cb4.aspx As always, I'd value your thoughts and feedback. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
Good catch, forgot to add the in-graph representation of the results to my mail, thanks for adding that part. Temporary (transient) nodes and relationships would really rock here, with the advantage that with HA you have them distributed to all cluster nodes. Certainly Craig has to add some interesting things to this, as those resemble probably his in graph indexes / R-Trees. I certainly make use of this model, much more so for my statistical analysis than for graph indexes (but I'm planning to merge indexes and statistics). However, in my case the structures are currently very domain specific. But I think the idea is sound and should be generalizable. What I do is have a concept of a 'dataset' on which queries can be performed. The dataset is usually the root of a large sub-graph. The query parser (domain specific) creates a hashcode of the query, checks if the dataset node already has a resultset (as a connected sub-graph with its own root node containing the previous query hashcode), and if so return that (traverse it), otherwise perform the complete dataset traversal, creating the resultset as a new subgraph and then return it. This works well specifically for statistical queries, where the resultset is much smaller than the dataset, so adding new subgraphs has small impact on the database size, and the resultset is much faster to return, so this is a performance enhancement for multiple requests from the client. Also, I keep the resultset permanently, not temporarily. Very few operations modify the dataset, and if they do, we delete all resultsets, and they get re-created the next time. My work on merging the indexes with the statistics is also planned to only recreate 'dirty' subsets of the result-set, so modifying the dataset has minimal impact on the query performance. After reading Rick's previous email I started thinking of approaches to generalizing this, but I think your 'transient' nodes perhaps encompass everything I thought about. Here is an idea: - Have new nodes/relations/properties tables on disk, like a second graph database, but different in the sense that it has one-way relations into the main database, which cannot be seen by the main graph and so are by definition not part of the graph. These can have transience and expiry characteristics. Then we can build the resultset graphs as transient graphs in the transient database, with 'drill-down' capabilities to the original graph (something I find I always need for statistical queries, and something a graph is simply much better at than a relational database). - Use some kind of hashcode in the traversal definition or query to identify existing, cached, transient graphs in the second database, so you can rely on those for repeated queries, or pagination or streaming, etc. As traversers are lazy a count operation is not so easily possible, you could run the traversal and discard the results. But then the client could also just pull those results until it reaches its internal tresholds and then decide to use more filtering or stop the pulling and ask the user for more filtering (you can always retrieve n+1 and show the user that there are more that n results available). Yes. Count needs to perform the traversal. So the only way to not have to traverse twice is to keep a cache. If we make the cache a transient sub-graph (possibly in the second database I described above), then we have the interesting behaviour that count() takes a while, but subsequent queries, pagination or streaming, are fast. Please don't forget that a count() query in a RDBMS can be as ridicully expensive as the original query (especially if just the column selection was replaced with count, and sorting, grouping etc was still left in place together with lots of joins). Good to hear they have the same problem as us :-) (or even more problems) Sorting on your own instead of letting the db do that mostly harms the performance as it requires you to build up all the data in memory, sort it and then use it. Instead of having the db do that more efficiently, stream the data and you can use it directly from the stream. Client side sorting makes sense if you know the domain well enough to know, for example, you will receive a small enough result set to 'fit' in the client, and want to give the user multiple interactive sort options without hitting the database again. But I agree that in general it makes sense to get the database to do the sort. Cheers, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
I can only think of a few use cases where loosing some of the expected result is ok, for instance if you want to peek at the result. IMHO, paging is, by definition, a peek. Since the client controls when the next page will be requested, it is not possible, or reasonable, to enforce that the complete set of pages (if every requested) will represent a consistent result set. This is not supported by relational databases either. The result set, and meaning of a page, can change between requests. So it can, and does happen, data some of the expected result is lost. This is completely different to the streaming result, which I see Jim commented on, and so I might just reply to his mail too :-) I'm waiting for one of those SlapOnTheFingersExceptions' that Tobias has been handing out :) My fingers are, as yet, unscathed. The slap can come at any moment! :-) This sounds really cool, would be a great thing to look into! Should you want examples, I have a wiki page on this topic at http://redmine.amanzi.org/wiki/geoptima/Geoptima_Event_Log http://redmine.amanzi.org/wiki/geoptima/Geoptima_Event_Log ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
I think Jim makes a great point about the differences between paging and streaming, being client or server controlled. I think there is a related point to be made, and that is that paging does not, and cannot, guarantee a consistent total result set. Since the database can change between pages requests, they can be inconsistent. It is possible for the same record to appear in two pages, or for a record to be missed. This is certainly how relational databases work in this regard. But in the streaming case, we expect a complete and consistent result set. Unless, of course, the client cuts off the stream. The use case is very different, while paging is about getting a peek at the data, and rarely about paging all the way to the end, streaming is about getting the entire result, but streamed for efficiency. On Thu, Apr 21, 2011 at 5:00 PM, Jim Webber j...@neotechnology.com wrote: This is indeed a good dialogue. The pagination versus streaming was something I'd previously had in my mind as orthogonal issues, but I like the direction this is going. Let's break it down to fundamentals: As a remote client, I want to be just as rich and performant as a local client. Unfortunately, Deutsch, Amdahl and Einstein are against me on that, and I don't think I am tough enough to defeat those guys. So what are my choices? I know I have to be more granular to try to alleviate some of the network penalty so doing operations bulkily sounds great. Now what I need to decide is whether I control the rate at which those bulk operations occur or whether the server does. If I want to control those operations, then paging seems sensible. Otherwise a streamed (chunked) encoding scheme would make sense if I'm happy for the server to throw results back at me at its own pace. Or indeed you can mix both so that pages are streamed. In either case if I get bored of those results, I'll stop paging or I'll terminate the connection. So what does this mean for implementation on the server? I guess this is important since it affects the likelihood of the Neo Tech team implementing it. If the server supports pagination, it means we need a paging controller in memory per paginated result set being created. If we assume that we'll only go forward in pages, that's effectively just a wrapper around the traversal that's been uploaded. The overhead should be modest, and apart from the paging controller and the traverser, it doesn't need much state. We would need to add some logic to the representation code to support next links, but that seems a modest task. If the server streams, we will need to decouple the representation generation from the existing representation logic since that builds an in-memory representation which is then flushed. Instead we'll need a streaming representation implementation which seems to be a reasonable amount of engineering. We'll also need a new streaming binding to the REST server in JAX-RS land. I'm still a bit concerned about how rude it is for a client to just drop a streaming connection. I've asked Mark Nottingham for his authoritative opinion on that. But still, this does seem popular and feasible. Jim ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] REST results pagination
To respond to your arguments it would be worth noting a comment by Michael DeHaan later on in this thread. He asked for 'something more or less resembling a database cursor (see MongoDB's API).' The trick is to achieve this without having to store a lot of state on the server, so it is robust against server restarts or crashes. If we compare to the SQL situation, there are two numbers passed by the client, the page size and the offset. The state can be re-created by the database server entirely from this information. How this is implemented in a relational database I do not know, but whether the database is relational or a graph, certain behaviors would be expected, like robustness against database content changes between the requests, and coping with very long gaps between requests. In my opinion the database cursor could be achieved by both of the following approaches: - Starting the traversal from the beginning, and only returning results after passing the cursor offset position - Keeping a live traverser in the server, and continuing it from the previous position Personally I think the second approach is simply a performance optimization of the first. So robustness is achieved by having both, with the second one working when possible (no server restarts, timeout not expiring, etc.), and falling back to the first in other cases. This achieves performance and robustness. What we do not need to do with either case is keep an entire result set in memory between client requests. Now when you add sorting into the picture, then you need to generate the complete result-set in memory, sort, paginate and return only the requested page. If the entire process has to be repeated for every page requested, this could perform very badly for large result sets. I must believe that relational databases do not do this (but I do not know how they paginate sorted results, unless the sort order is maintained in an index). To avoid keeping everything in memory, or repeatedly reloading everything to memory on every page request, we need sorted results to be produced on the stream. This can be done by keeping the sort order in an index. This is very hard to do in a generic way, which is why I thought it best done in a domain specific way. Finally, I think we are really looking at two, different but valid use cases. The need for generic sorting combined with pagination, and the need for pagination on very large result sets. The former use case can work with re-traversing and sorting on each client request, is fully generic, but will perform badly on large result sets. The latter can perform adequately on large result sets, as long as you do not need to sort (and use the database cursor approach to avoid loading the result set into memory). On Wed, Apr 20, 2011 at 2:01 PM, Jacob Hansson ja...@voltvoodoo.com wrote: On Wed, Apr 20, 2011 at 11:25 AM, Craig Taverner cr...@amanzi.com wrote: I think sorting would need to be optional, since it is likely to be a performance and memory hug on large traversals. I think one of the key benefits of the traversal framework in the Embedded API is being able to traverse and 'stream' a very large graph without occupying much memory. If this can be achieved in the REST API (through pagination), that is a very good thing. I assume the main challenge is being able to freeze a traverser and keep it on hold between client requests for the next page. Perhaps you have already solved that bit? While I agree with you that the ability to effectively stream the results of a traversal is a very useful thing, I don't like the persisted traverser approach, for several reasons. I'm sorry if my tone below is a bit harsh, I don't mean it that way, I simply want to make a strong case for why I think the hard way is the right way in this case. First, the only good restful approach I can think of for doing persisted traversals would be to create a traversal resource (since it is an object that keeps persistent state), and get back an id to refer to it. Subsequent calls to paged results would then be to that traversal resource, updating its state and getting results back. Assuming this is the correct way to implement this, it comes with a lot of questions. Should there be a timeout for these resources, or is the user responsible for removing them from memory? What happens when the server crashes and the client can't find the traversal resources it has ids for? If we somehow solve that or find some better approach, we end up with an API where a client can get paged results, but two clients performing the same traversal on the same data may get back the same result in different order (see my comments on sorting based on expected traversal behaviour below). This means that the API is really only useful if you actually want to get the entire result back. If that was the problem we wanted to solve, a streaming solution is a much easier and faster
Re: [Neo4j] How to combine both traversing and index queries?
Another approach to this problem is to consider that an index is actually structured as a graph (a tree), and so if you write the tree into the graph together with your data model, you can combined the index and the traversal into a pure graph traversal. Of course, it is insufficient to simply build both the index tree and the domain model as two graphs that only connect at the result nodes. You need to build a combined graph that achieves the purpose of both indexing and domain structure. This is a very domain specific thing and so there are no general purpose solutions. You have to build the graph to suite your domain. One approach is to build the domain graph first, then decide why you want indexing, and without adding lucene (or any external index) to the mix, think about how to modify the graph to also achieve the same effect. On Mon, Apr 18, 2011 at 8:54 PM, Willem-Paul Stuurman w.p.stuur...@knollenstein.com wrote: Hi Ville, We ran into a similar problem basically wanting to search only part of the graph using Lucene. We used traversing to determine the nodes to search from and from there on use Lucene to do a search on nodes connected to the nodes from the traverse result. We solved it as follows: - defined a TransactionEventHandler to auto-update the indexes with node properties, but also add relationships to the same index. We use the relationship.name() as the property name for Lucene, with the 'other node' id as the value. - traverse to get a set of nodes from where on the search. We apply the ACL here to only return nodes the user is allowed to see. - create a BooleanQuery for Lucene with the relationship.name() field names and id's. So if the relationship would be 'IS_FRIEND_OF' and we want to do a full text search for 'trinity' on friends of people with ids 1,2 and 3, we create a query that contains: +(name:trinity) +(isfriendof:1 isfriendof:2 isfriendof:3) To make sure we only get back 'person' nodes we also indexed the node type (in our case 'emtype'), so the complete query is: +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3) This way you can easily traverse to define the 'edges' of where to search and let Lucene handle the search within that region. Optionally we add the ACL to the Lucene query as well using the same technique, basically adding all group ids the current user is member of and has a 'CAN_ACCESS' relationship with the node: +emtype:person +name:trinity +(isfriendof:1 isfriendof:2 isfriendof:3) +(canaccess:233 canaccess:254 canaccess:324) It works for us because in our case we know the traversal will return a reasonable set of nodes (not thousands+). Lucene can return thousands of nodes, but that's not a problem of course. And we can still use the fun stuff like sorting, paging and score results. Hope this helps. Cheers Paul PS: we always use lower case field names without underscores because somehow it makes Lucene happier On 18 apr 2011, at 11:19, Mattias Persson wrote: 2011/4/18 Michael Hunger michael.hun...@neotechnology.com: Would it be also possible to go the other way round? E.g. have the index-results (name:Vil*) as starting point and traverse backwards the two steps to your start node? (Either using a traversal or the shortest path graph algo with a maximum way-length)? That's what I suggested, but it doesn't exist yet :) To do it that way today (do a traversal from each and every index result) would probably be slower than doing one traversal with filtering. Cheers Michael Am 18.04.2011 um 11:03 schrieb Mattias Persson: Hi Ville, 2011/4/14 Ville Mattila vi...@mattila.fi: Hi there, I am somehow stuck with a problem of combining traversing and queries to indices efficiently - something like finding all people with a name starting with Vil* two steps away from a reference node. Traversing all friends within two steps from the reference node is trivial, but I find it a bit inefficient to apply a return evaluator in each of the nodes visited during traversal. Or is it so? How about more complex criteria which may involve more than one property or even more complex (Lucene) queries? The best solution IMHO (one that isn't available yet) would be to let a traversal have multiple starting points, that is have the index result as starting point. I think that doing a traversal and filtering with an evaluator is the way to go. Have you tried doing this and saw a bad performance for it? I was thinking to spice up my Neo4j setup with Elasticsearch (www.elasticsearch.org) to dedicate Neo4j to keep track of the relationships and ES to index all the data in them, however it makes me feel very uncomfortable to keep up the consistency when data gets updated. However, now I need to keep also Neo4j indices updated. And not to be said, combining traversal and an external index is yet more complicated. However I like
Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.
Hi Robert, I took a look at this and the issue is that you are using the OSMGeometryEncoder to decode the RTree nodes. And the GeometryEncoder is designed to be specific to your data model, while the RTree internal data is hard-coded into the RTree design. So there is no guarantee that any particular data model's geometry encoder will store a bounding box in the same way as the RTree does. In this particular case the RTree uses the conventions of the JTS Envelope, while the OSMGeometryEncoder uses the conventions of the GeoTools envelope. I looked around and the WKBGeometryEncoder we use for ESRI Shapefile support uses JTS, so your code would have worked there. So you should definitely not use the OSM-specific geometry encoder for looking at anything other than the OSM-specific geometries. The correct API to get the bounding box from the index is, in fact, the getLayerBoundingBox() method you already used. So the only mistake was to grab the root node of the RTree and pass it to the OSMGeometryEncoder as if it were an OSM Geometry Node, which it is not. Does that clarify things? Regards, Craig On Tue, Mar 29, 2011 at 9:44 AM, Robert Boothby rob...@presynt.com wrote: Sorry about dropping out at the end of last week - had some personal issues to deal with. I have the following unit test code that illustrates the breakdown in the envelope definition: @Test public void useLayer() { final OSMLayer osmLayer = (OSMLayer)spatialDB.getLayer(OSM-BUCKS); final GeometryFactory factory = osmLayer.getGeometryFactory(); System.out.println(Unit of measure: + CRS.getEllipsoid(osmLayer.getCoordinateReferenceSystem()).getAxisUnit().toString()); final Point point = factory.createPoint(new Coordinate(-0.812988,51.796726));//51.808721,-0.689735)); final Envelope layerBoundingBox = osmLayer.getIndex().getLayerBoundingBox(); final Envelope usedEnvelope = osmLayer.getGeometryEncoder().decodeEnvelope(((RTreeIndex)osmLayer.getIndex()).getIndexRoot()); System.out.println(Layer Bounding Box: + layerBoundingBox.toString()); System.out.println(Envelope used in search: + usedEnvelope.toString()); assertEquals(layerBoundingBox, usedEnvelope); SearchContain searchContain = new SearchContain(point); osmLayer.getIndex().executeSearch(searchContain); for(SpatialDatabaseRecord record: searchContain.getResults()){ System.out.println(Container: + record); for(String propertyName: record.getPropertyNames()){ final Object propertyValue = record.getProperty(propertyName); if(propertyValue != null){ System.out.println(\t + propertyName + : + propertyValue); } } } } It throws an assertion failure at the 'assertEquals(layerBoundingBox, usedEnvelope)' - effectively I pull the layer index envelope out using the same code as the 'AbstractSearch.getEnvelope(Node geomNode)' does (used in AbstractSearchIntersection.needsToVisit()) and compare it with what the layer thinks the envelope should be. The same numeric values appear in the two envelopes but in different fields. Hopefully this gives you all you need to diagnose the problem - but if not let me know and we can work out how to drop my rather large test data set and project into a common place. Robert. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] question
I think for that the TimelineIndex interface would have to be extended to be able to hold additional data so that you can do compound queries http://docs.neo4j.org/chunked/milestone/indexing-lucene-extras.html#indexing-lucene-compound to it and get exactly the functionality you're asking for with only one index. Another way is to just copy the LuceneTimeline code and roll this yourself, it's really small, mostly one-liners for each implemented method. Alternatively just role your own graph-tree structure that provides the same capabilities. Then you can index any combination of properties together, to suite your planned queries. This is obviously much more work than Mattias suggestion, and does require that you know more about your domain (ie. less general). But it does allow you to inspect the index itself with graph traversals, gremlin or neoclipse, which is not possible with lucene. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] question
Ok, in fact it shouldn't be a performance downgrade even with large blobs, right? It just depends on whether the queried part refers to an id or similar and that node then is simply connected to the blob. I.e. I extract the blob only when I am sure it is the wanted. Is that what you meant? If the blob is a property of a node, it is not loaded when you access that node, but when you access its properties. I do not know enough about the implications on performance with large blobs, only that it has been mentioned many times before that for really large blogs, rather store them somewhere else (eg. filesystem) and reference them from the graph (eg. path to file, url, etc.). But I still believe that blogs are not big enough to really be a concern, but perhaps someone more knowledgable can correct me here? Probably because of the database type the old way of having a look at the data is not possible any longer. But then which is the right way? Having a console and let Gremlin shine? Filtering the neoclipse view with relationship types and directions helps. Limiting the number of nodes returns helps a lot. I use 100 max. But use neoclipse as a visualization tool mostly for visualizing the structure, not for analytics. Ok, I change my question. What do you do when you have two big types of data, one that does perfectly fit in the graph concept, and one the really doesn't have anything to do with it? I guess you put everything into the neo4j db and then query one with the graph traverser and the other one with the lucene indexer? My questions might seem a bit dummy, I apologize for that, I am trying to understand why and how I should make use of a graph database. When I'm deciding between using a graph or using lucene, the size of the data is not really a factor, but its 'graphiness' :-) For example, if I have a property of very high diversity, like peoples names, then lucene is a natural choice. If you have a property with structure, like categories or tags, or inheritance, or other relationship concepts, then the graph is best. There are cases in the middle, for example I generally model numerical properties in the graph, but I think most others would use lucene. I use the graph because it naturally leads to statistics data. For example, if we use the time property, and collect all events in the same second and connect them to the same 1s time node, we now know the number of events in that second from the structure of the graph. Connect each 1s node to a 1min node, and we know how many seconds in that minute contained data, etc. Obviously this is a very simple special case, and usually I keep more statistical metadata in the graph tree than mere counters, but the result is that your index now contains lots of statistics you can query without even touching the original data nodes (ie. very fast statistics queries). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] question
That sounds nice. My scenario is something like: I have a centralized database. On the Desktop side I have a workstation on which I do GIS analysis. People want to get a chunk of data of interest, so they can pollute them with their analyses until they are happy. So it is a bit the concept of a distributed versioning system. I have my local workspace, on which I play, then I could push back that result I liked. Anything like that around? :) Not that I know of, but the issue I believe has been tackled by many users of neo4j. It is quote domain specific, and so not easy to generalize, but probably not that hard to implement for a limited, specific domain. For example, I have a product that has three components, an Android client collecting data and posting JSON packets at a central neo4j server, which adds them to a graph. Then my desktop app, just like yours, queries the server for a subset of the data, duplicates that in its internal, local neo4j database, and performs statistical calculations on that. I do not (yet) publish these results back to the central server, so I have not dealt with any versioning or conflict resolution, but have thought about it (at least within the scope of my domain). ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] question
Agreed, Rick. My opinion is the main reason to role your own index is to make use of domain specific optimizations not available with generic indices. In my case the main win is the combination of statistics result and index that is possible. But I have to confess, the real reason I started using graphs as indexes was just that I thought the graph concept so cool, I did not want to pollute it with something non-graphy. Foolish ideology, I know, and I grew out of that more than a year ago, but it did influence many of my early neo4j decisions :-) On Wed, Mar 30, 2011 at 1:49 PM, Rick Bullotta rick.bullo...@thingworx.comwrote: My experience with using large graph trees for indexes has been mixed, with performance issues under heavy read/write load, perhaps due to the many potential locks required during insertions. We switched to the timeline index, fwiw. - Reply message - From: Craig Taverner cr...@amanzi.com Date: Wed, Mar 30, 2011 7:43 am Subject: [Neo4j] question To: Neo4j user discussions user@lists.neo4j.org I think for that the TimelineIndex interface would have to be extended to be able to hold additional data so that you can do compound queries http://docs.neo4j.org/chunked/milestone/indexing-lucene-extras.html#indexing-lucene-compound to it and get exactly the functionality you're asking for with only one index. Another way is to just copy the LuceneTimeline code and roll this yourself, it's really small, mostly one-liners for each implemented method. Alternatively just role your own graph-tree structure that provides the same capabilities. Then you can index any combination of properties together, to suite your planned queries. This is obviously much more work than Mattias suggestion, and does require that you know more about your domain (ie. less general). But it does allow you to inspect the index itself with graph traversals, gremlin or neoclipse, which is not possible with lucene. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Google Summer of Code 2011
Hi all, Last year Neo4j was represented in the Google Summer of Code with two successful projects, in collaboration with Gephi and OSGeo. This year we are again interested in supporting GSoC projects within other open source organizations interested in integrating with Neo4j. The OSGeohttp://osgeo.orghas already welcomed our offer to mentor Neo4j Spatial http://components.neo4j.org/neo4j-spatial/snapshot/neo4j-spatial/projects within the OSGeo umbrella. We will be updating the Neo4j, uDig and possibly GeoTools and GeoServer wikis where necessary. If you have ideas that are not related to Neo4j Spatial, contact us on the mailing list and suggest which accepted organization we could partner with. If the idea is interesting enough, we should be able to find a mentor for it. For further information, here are some links to follow: - GSoC ideashttp://udig.refractions.net/confluence/display/HACK/Summer+of+Codeon the uDig wiki (focusing on Neo4j Spatial) - Neo4j GSoC Ideashttp://wiki.neo4j.org/content/Google_Summer_of_Code_Ideason the Neo4j wiki, with a wide range of interesting ideas - List of accepted GSoChttp://socghop.appspot.com/gsoc/program/accepted_orgs/google/gsoc2010organizations we could consider partnering with for GSoC projects Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] question
Hi Andrea, I am cc'ing the list with these answers, since I think there are questions here others know much more about. I will answer all with what I know, or think I know :-) 1) what can I put into the node? I see that the superclass proposes Property types. So I was wondering if blogs are completely out of question. You can put any java primitive, strings and arrays of primitives and strings. Since anything can be serialized to byte[], you can store anything you want really. But practically it is considered no-ideal to store very large blobs or strings due to reduced performance. But my understanding is that blogs are typically small enough to not be a problem for this. So just store the blog contents as a string. 2) could neoclipse act like a kind query engine for udig? Or perhaps be easily adapted? I am wondering how an advanced user could browse the database. There are a lot of ways of browsing the database, especially for web user interfaces. See http://wiki.neo4j.org/content/Visualization_options_for_graphs I have a modified and embedded version of neoclipse inside my AWE application (see the online videos on youtube and vimeo). I use it to allow *advanced users to browse the database* :-) 3) Is the jo4neo the right way to go to use annotations? Is it something you use? I have not used jo4neo. But it seems like a convenient approach. One concern I have with object databases is that you risk loosing performance on traversals if the objects need to be de-serialized on access. I know early versions of the Ruby library neo4j.rb had that issue, but it was resolved later. Perhaps jo4neo never had the problem. 4) Indexes in rdbm are done on a table basis and every new record gets inserted. Now you have to add the values to the index? It looks to me as if you index only what you want to, right? But while not indexing in an rdbm leads to slover results, in this case the result is missing? This is only due to there being two search API's. If you search using the index, you get answers from the index. If you search using the graph you get answers from the graph. Many questions are actually faster using the graph, so you should not be too quick to use an index at all. In fact, dare I say it, if you need an index perhaps you have not modeled the graph correctly :-) (having said that, I do use the index myself, but less often than the graph). In an RDBMS the normal, non-indexed search is extremely slow (brute force search), and the index is a drop-in replacement for that. But in the graph, the only brute force search would be a full database scan, which no-one sane would do... Instead the graph search requires knowledge of the graph structure, and therefor the search query can be complex, and by definition completely different to an index search. So the graph search and index search are far too different to be placed behind the same API. However, I could imagine that some object db wrappers like neo4j.rb and jo4neo might be able to do this, since they have influence over, and knowledge of, the graph structure they create. 5) does neo4j have a replication tool? I.e. is it possible to sync a remote and a embedded database instance? Hibernate used to help here. Are there tools to help? Yes. There are two. There is a hot backup option for making backups of live running databases. And there is the relatively new HA (high availability) infrastructure for keeping clusters of databases synchronized. The various databases can be running as embedded or as server, it does not matter, they can still be synchronized. There are rules, however as to which is master, and which is best to write to. See http://wiki.neo4j.org/content/High_Availability_Cluster 6) Timeseries. The only way to hande then seems to be the timeline, right? So I happily create a timeline like: timeLine = new Timeline(precipitations, firstNode, graphDb); timeLine.addNode(firstNode, time); and then add the whole timeseries nodes. By that they are indexed already. (btw the insertion of about 9000 values took quite some time more than H2 or postgres, is that normal? Not that this bothers me that much, but I would like to know if I am doing somethign wrong) I have no experience with the timeline class. I have always rolled my own time index, and it was fast. However, the most likely issue you are facing is with too many transactions. Group your commits. The easiest way to do this is every 1000, or 1 (pick a number between these two :-), just do tx.success();tx.finish();tx=db.beginTx(); The above number of 9000 could be added in one single commit. So move the try{]finally{} around the entire loop. Then add the intermediate commits as described above if you think you will get more data than that. I personally commit every 1000. I found going to a bigger number helped, but not that much. Most of the performance gains are achieved in the first 1000. And how do I run to query it? I couldn't find any docs and testcase
Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.
Hi Robert, Interesting work you're doing. I just read your blogs and I think it would be great to discuss your tests in more detail. Michael Hunger has done some interesting tests on the scalability of the OSM import, and could probably give suggestions on configuring the import. Looking at your code below, I think you have swapped the x and y around in the Coordinate constructor. It should be Coordinate(x,y), but the values you have passed look like lat,long (which means y,x). Also, the SearchContain should return geometries that contain the point you passed, so if your point is within a lake, or building, or some other polygon geometry, you should get results, but I do not think it will return anything if you point is not actually contained within closed polygons. To give a more complete answer, I think I would need to run and test your code. Hopefully the above comments help resolve the issue. Regards, Craig On Thu, Mar 24, 2011 at 12:42 PM, Robert Boothby rob...@presynt.com wrote: Hi, I've been playing with Neo4j Spatial and the OSM data imports to see how it all fits together. I've been blogging on my experiences (http://bbboblog.blogspot.com). It's still early days but think that I have run into an issue. Having imported the OSM data successfully I've tried to execute this code to determine whether the centre of the town of Aylesbury (UK) is within the county of Buckinghamshire (which it is) and to pull back all nodes which contain the centre of the town: final OSMLayer osmLayer = (OSMLayer)spatialDB.getLayer(OSM-BUCKS); final GeometryFactory factory = osmLayer.getGeometryFactory(); final Point point = factory.createPoint(new Coordinate(51.796726,-0.812988)); SearchContain searchContain = new SearchContain(point); osmLayer.getIndex().executeSearch(searchContain); for(SpatialDatabaseRecord record: searchContain.getResults()){ System.out.println(Container: + record); } The layer does contain the appropriate data imported from a .osm file extract for Buckinghamshire (the smallest file for an English county). When I've tried to run it I've got no results and when I've debugged it appears that the bbox property attributes (minx, maxx, miny, maxy) for the layer's root node are incorrect (mixed up) - minx=-1.1907455, maxx = 51.0852483, miny=0.3909055, maxy=52.2274931 causing the search to return immediately. Am I using this API correctly and have I stumbled into a genuine bug? Thank you, Robert Boothby. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4J Spatial - issue with bounding box indices in OSM Layer.
I will need to double check this. I know there was a dispute early on with Neo4j Spatial because the JTS library orders bbox params in one way, and GeoTools does it another way, so you might be seeing the results of that. I believed we sorted that all out, but perhaps not. I have have just checked the RTree code, and it does the right thing, however it is possible that code outside is passing in bbox parameters in a different order than expected. Perhaps you have some sample test code I can use to check this with? On Thu, Mar 24, 2011 at 7:01 PM, Robert Boothby rob...@presynt.com wrote: Thanks for the response Craig. You're right I had mixed up the latitude and longitude coordinates. I've now got the expected answers... However the transposition of the elements of the spatial index's root node geometry envelope definitely occurs. I just wouldn't have spotted it if I hadn't mixed up the coordinates and debugged it. The envelopes of the result nodes do not have transposed elements. The index root node element bounding box envelope looks like: Env[-1.1907455 : 51.0852483, 0.3909055 : 52.2274931], my point envelope looks like Env[-0.812988 : -0.812988, 51.796726 : 51.796726] and the other node envelopes looks like Env[-0.8577898 : -0.7723875, 51.7930991 : 51.8378447], Env[-0.8502576 : -0.8028032, 51.786247 : 51.8211053], Env[-0.8516907 : -0.7686481, 51.7932282 : 51.8335107],Env[-0.8574819 : -0.7733667, 51.7921887 : 51.8394132],Env[-0.9067499 : -0.6564949, 51.6661276 : 51.9251302], Env[-0.8159025 : -0.8114534, 51.7961756 : 51.7997272], . It appears that the maxx and miny values have been swapped in the spatial index root node. There may be certain scenarios when valid coordinates are excluded because the spatial index root node has the incorrect envelope. Robert. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo4j Spatial] Need advice on modeling my spatial domain properly
Hi Chris, A lot depends on your final intentions of how to use the model. There are many, many ways to model this, and each has its pros and cons. Let me try briefly describe two options I can think of that are related to the two factors you suggest below. *Option one - model time in the graph* In this case I'm assuming you want to store information about car movements. For example, you are a logistics company tracking your fleet, and each car/truck has a GPS and uploads data continuously. This data would be stored as an event stream in the database, indexed spatially in the RTree, and indexed with other indexes too (time of event (timeline index), which car (category index), which driver (category index), other properties of interest (lucene), etc.). You can relate the car to the OSM model through routing information (eg. the car is following a planned route on the OSM graph). Perhaps you model the route as a chain of nodes also, resulting in a three layer graph, the static OSM, the planned route and the actual route coming in live. This approach results in a very complete data model can can be historically mined for statistics and behaviours (eg. which cars match planned routes best, general speed patterns, driving behaviours, etc.) For this model there is value in adding your own geometry encoder if you wish to expose your own data (routes, and car traces) to a map or GIS. Since it is all point data, you could just use the SimplePointEncoder, but then you would not see lines, only points. If you want lines, rather make your own geometry encoder that understands how the nodes are connected in chains. Review the code of the sample encoders, it is not complex. *Option two - model time in analysis* Assuming the previous case is overkill, and you have no interest in fleet tracking and historical modeling, and all you want is a map that shows a single point for a car as it moves, it might be possible to not include the car in the database at all. Where do you get the car data from? If it is a stream of information from some data source, that stream could be consumed by the map view itself, just updating the points on the map. If you wish the map to not have to know about your own stream, then you can use the database. Perhaps you do something very simple, just store each car location in a SimplePointLayer (like the blog), and whenever a car change event arrives (from your source of car data, whatever that is), you could remove the car node from the RTree index and re-add it (basically re-index the point at a new location). The map needs to redraw that layer too, so you need to trigger that. If there are lots of cars moving all the time, rather just redraw the map layer on a timer. The reason I called this 'model time in analysis' is that since there is no time component in the graph, each car has only one current position, any analysis of car behaviour would have to be done external to the graph, perhaps on the incoming gps stream. So this is much more limited in possibility than the previous case. As you can see I had to make a ton of assumptions about your data and your intentions to describe the above models. I assume the odds are low that I matched your exact case very well, but hopefully I gave you some ideas to think about. Regards, Craig On Fri, Mar 18, 2011 at 11:57 AM, Christoph K. klaassen.christ...@googlemail.com wrote: Hi peole, i'm working on a project, where i want to map live data of cars on streets. I take my map data from OSM-maps for test purposes - so there's no problem at all. But i have no idea on how to integrate my car data. Should i implement my own geometryencoder, so that my car nodes can contain a position property. Or does it make sense to relate my car nodes to point nodes, which are representing the current position of my car? Some advice would be great! greetings from bavaria Christoph ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo4j Spatial] Need advice on modeling my spatial domain properly
Hi Chris, I'm glad my comments were of help. I think you are exactly right in starting simple and enhancing as you need. This is certainly one of the points of a 'schema-less database' after all :-) Neo4j can handle very large amounts of data, and every car on the planet would fit, but perhaps not every car logging GPS points every few seconds for a long time! You can deal with the performance issues as they arise (although it helps to keep performance in mind as you go, of course). If you are curious about a related system I was involved in, take a look at the videos on my vimeo account at http://vimeo.com/craigtaverner. They show an android application being driven around (in cars), uploading GPS and other data to a central database (using neo4j), then being downloaded by a desktop application containing another neo4j database which duplicates a small subset of the data, and performs statistical analysis of the results and displaying on a map. Not the same use case as yours, but certainly related. Regards, Craig On Fri, Mar 18, 2011 at 1:11 PM, Christoph K. klaassen.christ...@googlemail.com wrote: Hi Craig, wow, this is a great reply :) Thank you very much for your advices. To be a bit more precise about the project: it's a mix of both of your options. Option 2 is a great start from which i could go on and test, how neo4j behaves in my special case. I get the data from the car itself (via umts or sth like that) and want to provide some environment informations back to the car. But it is intended that i have to deal with a huge amount of cars... thousands up to... yeah maybe all cars on this planet ;-) (think big). Option 1 with the possibility to inspect historical data would be interesting, but i'm not sure, if neo4j is powerful enough to store that amount of data, which is intended to be collected. So this would not be a feature at this time but interesting for later enhancements. I think i will try a simple encoder implementation to do a hybrid of your options :) this leaves the option to extend the model more easily if it's desired. greetz Chris On Fri, Mar 18, 2011 at 12:47 PM, Craig Taverner cr...@amanzi.com wrote: Hi Chris, A lot depends on your final intentions of how to use the model. There are many, many ways to model this, and each has its pros and cons. Let me try briefly describe two options I can think of that are related to the two factors you suggest below. *Option one - model time in the graph* In this case I'm assuming you want to store information about car movements. For example, you are a logistics company tracking your fleet, and each car/truck has a GPS and uploads data continuously. This data would be stored as an event stream in the database, indexed spatially in the RTree, and indexed with other indexes too (time of event (timeline index), which car (category index), which driver (category index), other properties of interest (lucene), etc.). You can relate the car to the OSM model through routing information (eg. the car is following a planned route on the OSM graph). Perhaps you model the route as a chain of nodes also, resulting in a three layer graph, the static OSM, the planned route and the actual route coming in live. This approach results in a very complete data model can can be historically mined for statistics and behaviours (eg. which cars match planned routes best, general speed patterns, driving behaviours, etc.) For this model there is value in adding your own geometry encoder if you wish to expose your own data (routes, and car traces) to a map or GIS. Since it is all point data, you could just use the SimplePointEncoder, but then you would not see lines, only points. If you want lines, rather make your own geometry encoder that understands how the nodes are connected in chains. Review the code of the sample encoders, it is not complex. *Option two - model time in analysis* Assuming the previous case is overkill, and you have no interest in fleet tracking and historical modeling, and all you want is a map that shows a single point for a car as it moves, it might be possible to not include the car in the database at all. Where do you get the car data from? If it is a stream of information from some data source, that stream could be consumed by the map view itself, just updating the points on the map. If you wish the map to not have to know about your own stream, then you can use the database. Perhaps you do something very simple, just store each car location in a SimplePointLayer (like the blog), and whenever a car change event arrives (from your source of car data, whatever that is), you could remove the car node from the RTree index and re-add it (basically re-index the point at a new location). The map needs to redraw that layer too, so you need to trigger that. If there are lots of cars moving all the time, rather
Re: [Neo4j] Where is the beer?
When I added my face, I tested to make sure it scaled the same as the others. The results: no images scale at all, no matter what zoom level. They are all fixed size images. Google said we should load images up to 64x64, so I originally loaded a 64x64 image, but since it was noticeably larger than the others, I scaled it down to 48x48. It is still a bit bigger, which is probably what is still bothering you. Seems we have a +1 for faces (from Peter) and a -1 for faces (from Anders). What do others think? (I must admit if I am going to be the only one with a face, then perhaps the vote is clear ...) On Thu, Mar 17, 2011 at 3:54 PM, Anders Nawroth and...@neotechnology.comwrote: Please don't do the pretty face thing, such icons aren't scaled in any sensible way when zooming out! Or find out how to make them scale down ... /anders On 03/17/2011 03:30 PM, Peter Neubauer wrote: Jordi, do you need an invite to add yourself? Btw, the map looks really pretty now! Need to get some pretty face like Craig on my icon :) Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Mar 14, 2011 at 9:02 PM, Jordi Valverdede...@eclipsi.net wrote: Invite me: jvalve...@gmail.com :-) El 14/03/11 14:21, Andreas Kollegger andreas.kolleg...@neotechnology.com escribió: I've shared a map with you called Neo4j Graphistas: You can view and edit this map at http://maps.google.com/maps/ms?ie=UTF8hl=enoe=UTF8msa=0msid=2157872407 36307886514.00049e70e573cbd8a91e5 Where are people graphing? Add yourself to the map (or at least your city ;) Note: To edit this map, you'll need to sign into Google with this email address. To use a different email address, just reply to this message and ask me to invite your other one. If you don't have a Google account, you can create one at http://www.google.com/accounts/NewAccount?reqemail=user@lists.neo4j.org. Cheers, Andreas On Mar 14, 2011, at 2:04 PM, Alfredas Chmieliauskas wrote: Great! I think thats a great idea! A On Mon, Mar 14, 2011 at 2:02 PM, Michael Hunger michael.hun...@neotechnology.com wrote: I would, I already have extensive plans for that. I will share them with you :) Cheers Michael Am 14.03.2011 um 13:50 schrieb Alfredas Chmieliauskas: Who would like to start a social networking site for developers (on top of neo4j technology and community)? I'm in. A On Mon, Mar 14, 2011 at 1:45 PM, bhargav gunda bhargav@gmail.com wrote: Stockholm, Sweden On Mon, Mar 14, 2011 at 1:41 PM, Alfredas Chmieliauskas al.fre...@gmail.com wrote: Amsterdam On Mon, Mar 14, 2011 at 1:15 PM, Axel Morgnera...@morgner.de wrote: Hi everybody, as said, here's a new thread for the idea of having beer and talk meetings. Possible locations so far: Malmö London Berlin Frankfurt Looking forward to seeing more Neo4j people in personal! Greetings Axel On 14.03.2011 13:02, Peter Neubauer wrote: Berlin sounds great. Last year a couple of guys met up at StudiVZ, and suddenly we were 30 people. Go for it, there is a LOT of good vibe in Beerlin! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Mar 14, 2011 at 12:37 PM, Michael Hunger michael.hun...@neotechnology.com wrote: They guys could create at least one in Malmö? Isn't Andreas there as well, and certainly some more fine folks? We can do one locally here in Gemany, perhaps Berlin (perhaps we can combine that with our monthly flight to CPH). Cheers Michael Am 14.03.2011 um 11:50 schrieb Jim Webber: Hey Rick, It was a pleasure to meet you too. And this got me thinking - it would be great to meet more folks from this list, or to form user groups, or generally just get a beer and talk Neo4j graphs. Is there, for example, a strong London contingent on this list? I only know me and Nat Pryce so far. Anyone else care to get together in London? Jim
Re: [Neo4j] Graph design
One key point of Davids suggestion is that it takes into account that each action of the user could take place from a different IP. Massimo's original model implied that the user would always be at the same IP for all actions, or if he could change IP's you would not know which of them related to which action. So even though Davids model is more complex, it seems more correct. Another solution is to create a uid-ip node, representing all cases where a particular user is at a particular IP. Then that would have direct relations to all domains (as massimo originally had), and it would have a single relationship to it's user and it's ip nodes. The graph looks similar to Davids, but we would have much fewer nodes (all actions from the same uid-ip are merged). On Wed, Mar 16, 2011 at 7:08 PM, Massimo Lusetti mluse...@gmail.com wrote: On Wed, Mar 16, 2011 at 7:03 PM, David Montag david.mon...@neotechnology.com wrote: Massimo, If you'd like, I could skype with you later this afternoon (in 4-5 hours) and discuss it? David Wow that's would be cool... But hopefully I'm going to be sleeping, I need it... Anyway I'll do my homework and come back to you! Thanks for the offer... really appreciated. -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j spatial
Hi Saikat, There are a few places you can look for code samples. One of the best places is the set of test cases included in neo4j spatial. You can find them at https://github.com/neo4j/neo4j-spatial/tree/master/src/test/java/org/neo4j/gis/spatial. In particular, since you are interested mostly in point data, take a look at TestSimplePointLayer.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/TestSimplePointLayer.javaand LayersTest.javahttps://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/LayersTest.java . What you will find in those classes is Java code for adding points to the database, similar, but more extensive than the code in the blog. Regarding your specific case, if you are working with a normal google map or bing map, and want to port the points into a local database, you would need to export them, and write a simple importer. If you have written a mashup between google or bing maps and your own neo4j-based web application, you should be able to use some client side coding to automate this, accessing the map, and posting the points directly into your own server (where of course you would have some code adding the points to the database). Does this answer your question? Regards, Craig On Thu, Mar 17, 2011 at 12:32 AM, Saikat Kanjilal sxk1...@hotmail.comwrote: Hi Folks, I was reading through the docs on neo4j spatial and was wondering about a few things: 1) If I have a google or bing map and I manually plot some points can I use neo4j spatial to automate the loading of those points into my neo4j db? 2) Are there code samples for neo4j-spatial or implementations I can look at for a deeper look at the API's etc? Best Regards Sent from my iPhone ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Traversal framework
I like the pipes idea. What I would like to see is nested traversers. The pipe example below seems to imply single hops at each step, but it would be nicer to allow each step to traverse until it reached a certain criteria, at which point a different traversal would take over. In the old and current API's it seems to do this you need to create a traversal, iterate over it, and create a new traversal inside the loop. We created a Ruby DSL for nested traversals a year or so ago that looks a bit like: chart 'Distribution analysis' do self.domain_axis='categories' self.range_axis='values' select 'First dataset',:categories='name',:values='value' do from { from { traverse(:outgoing,:CHILD,1) where {type=='gis' and name=='network.csv'} } traverse(:outgoing,:AGGREGATION,1) where {name=='azimuth' and get_property(:select)=='max' and distribute=='auto'} } traverse(:outgoing,:CHILD,:all) end end This is quite a complex example, but the key points are the from method which defines where to start a traversal, and the traverse method which defines the traversal itself, with the where method which is like the old ReturnableEvaluator. Will the new pipes provide something like this? On Tue, Mar 15, 2011 at 9:19 AM, Massimo Lusetti mluse...@gmail.com wrote: On Tue, Mar 15, 2011 at 9:11 AM, Mattias Persson matt...@neotechnology.com wrote: I'm positive that some nice API will enter the kernel at some point, f.ex. I'm experimenting with an API like this: for(Node node : PipeBuilder.fromNode(startNode).into(otherNode(A)).into(otherNode(B)).nodes()) { // node will be (3) from the example above } I hope I didn't confuse you with all this :) Nope, the opposite. Thanks for the clarification and that kind of API would be a killer feature IMHO. It will be even more pleasant to work with neo4j... Cheers -- Massimo http://meridio.blogspot.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Neo4j Spatial and GeoTools
Hi, There were a few comments on twitter about the use of GeoTools in Neo4j Spatial, so I wanted to elaborate on the discussion with a short description of where and why we include some GeoTools libraries in Neo4j Spatial. The discussion started with two tweets by martengustafsonhttp://twitter.com/#!/martengustafson : Current state of Java GIS libraries and their interdependencies: #failhttp://twitter.com/#!/search?q=%23fail followed shortly by: I'm sorry Neo4J spatial. I'm sure you're nice and all but the endless cruft that is OpenGIS and GeoTools brought you down with them. There followed a short chat between Martin and I to find out what his concerns were. You can follow the chat on twitter, but my summary of it is that Martin noticed the use of the Coordinate and Geometry classes in my blog post at http://blog.neo4j.org/2011/03/neo4j-spatial-part1-finding-things.html. In addition he noticed that to run the test cases, or the code on the blog, you needed to include geotools libraries in the classpath, and he had negative experiences with geotools in the past. So he felt this dependency reflected negatively on Neo4j Spatial. My answers to Martin were to explain briefly why we use GeoTools and where. I would like to elaborate in more detail here. Firstly, the main core of Neo4j Spatial does not use GeoTools, but rather JTS, a lower level topology library with a lot less dependency-complexity than GeoTools, and so hopefully much less of a concern to Martin. The Coordinate and Geometry classes used in the blog are from JTS. Martin admitted that he thought JTS used GeoTools, not the other way round. The TestSimplePointLayer test case the blog was based on has no GeoTools dependencies itself. However, the fact remains that several GeoTools libraries are still dependencies of Neo4j Spatial. While the core design does not require GeoTools, there are three places they are used: - The API's to expose the Neo4j Spatial data to well known GIS's that use GeoTools, like GeoServerhttp://wiki.neo4j.org/content/Neo4j_Spatial_in_GeoServerand uDig http://wiki.neo4j.org/content/Neo4j_Spatial_in_uDig. Obviously we need GeoTools libraries to enable GeoTools compatibility. - Some current and future import/export utiltiies. The ShapefileImporter, for example, uses GeoTools support for reading shapefiles. We have investigated, but not included, GeoJSON support based on GeoTools also. - The DefaultLayer implementation uses GeoTools for the WKB and WKT Geometry Encoders, since WKB and WKT are well supported in GeoTools. While this is currently part of the core code, the design of Neo4j Spatial is plugable, so you do not need to use this code. However, you do need to include the relevant GeoTools libraries. Early on in the project, it was considered to split Neo4j Spatial into two parts, a core that only required JTS and extensions that required GeoTools. However, this route was not taken for a few reasons. Chief among them was the fact that most people trying out Neo4j Spatial were happy with maven, and so dependencies were not a serious issue. So simplicity of development won out, and we kept GeoTools dependencies in the core. Recently I felt the negative side of this when I developed the neo4j-spatial.rb https://rubygems.org/gems/neo4j-spatial Ruby gem, and needed to include dependencies in the gem. The maven dependencies were a bit over zealous and so too many libraries were included. Michael Hunger came to the rescue and cleaned up the dependencies somewhat, so the current gem is quite a lot thinner than the earlier ones. Anyway, aside from my larger-than-necessary early gems, Martins concerns on twitter are the first community criticism of the use of GeoTools in Neo4j Spatial. I would like to know if others feel this is something we should be concerned with, and try to split the core out so as not to require GeoTools? My personal impression is that the benefits far outweigh the disadvantages, but I would like to know what others think. Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Nice change from all this Java
Cool syntax. I love that he has not skimped on docs. But isn't the EPL going to be a problem here? (a'la EPL/GPL clash) On Wed, Mar 9, 2011 at 10:38 PM, Andres Taylor andres.tay...@neotechnology.com wrote: Hey all, Wanted to share something I just found on reddit. https://github.com/wagjo/borneo After being force fed all this Java, a bit of Clojure was very welcome. Looks really nice too. Andrés ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neoclipse Wish List
But as far as I know there is not filter on properties of nodes or relationships. Should be easy to add though. How would you like that to look? Perhaps this could use the REST API traversal syntax also? - List of existing lucene indices available to query The new search dialog now lists the node indices from the integrated index. The relationship indices will soon come, but there's more refactoring needed to add relationship search. The next step is then to add Lucene query support, not only lookup for exact matches. Excellent. I had not checked that. - visualize sub-graph not by a fixed depth, but by a traversal query (possibly using the syntax of the REST API, since that is dynamically interpreted) Nice idea to use the REST API syntax! And it is hopefully easy to implement :-) (so we can see it soon?) Regards, Craig ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Batch Insertion, how to index relationships?
I think you'll have to add a dummy key/value to each relationship, like exists/true or whatever. The overhead for that is insignificant and once relationships are indexed with whatever key/value they can be queried with those additional start/end node. And I believe you can index the relationship with these dummy values even if the properties do not exist on the relationship itself, right? That saves a lot of space. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neoclipse Wish List
You can filter the number of nodes in the preferences settings (click the gear icon on the top left, and then click neo4j, and change maximum number of nodes). You can filter the relationships by types in the relationships view, deselect the types (also by incoming/outgoing). But as far as I know there is not filter on properties of nodes or relationships. Should be easy to add though. My wish list includes: - Menu of possible alternative roots (places in the database to jump directly to, possible based on a lucene query). - List of existing lucene indices available to query - visualize sub-graph not by a fixed depth, but by a traversal query (possibly using the syntax of the REST API, since that is dynamically interpreted) On Sat, Mar 5, 2011 at 10:38 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Neoclipse is an awesome tool, but here are a few items that would greatly increase the utility and usability: 1) Provide a limit on the # of nodes/relationships that are displayed (and a warning that additional nodes and relationships were not shown) 2) Provide display filters based on node and/or relationship property values and relationship types I think these would greatly improve the situation when very large or complex graphs are involved, since now the visualizations become far too busy to do anything. Thoughts? Rick ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Release of 1.0.0
Thanks for a great gem Andreas, One thing I noticed is that rubygems.org still lists version 0.4.6 as the official release. On Thu, Mar 3, 2011 at 10:42 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Amazing work Andreas, the community is truly thankful to both you and all the contributors and Rubyists out there! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Wed, Mar 2, 2011 at 7:58 PM, Andreas Ronge andreas.ro...@gmail.com wrote: Hi Just promoted 1.0.0.beta.32 to version 1.0.0neo4j.rb version 1.0.0 #neo4j #jruby #rails I also have written a blog how to get started with Neo4j/Rails 3 http://blog.jayway.com/2011/03/02/neo4j-rb-1-0-0-and-rails-3/ Thanks a lot for all your feed back and contributions ! /Andreas -- You received this message because you are subscribed to the Google Groups neo4jrb group. To post to this group, send email to neo4...@googlegroups.com. To unsubscribe from this group, send email to neo4jrb+unsubscr...@googlegroups.com. For more options, visit this group at http://groups.google.com/group/neo4jrb?hl=en. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Load last 10 created nodes from neo4j
What about taking the id of the last node created, and just decrementing backwards by 1, ten times, and get those ten nodes? This will not take into account id-reuse, though, so if you have node deletion, this will not necessarily give the last ten added, only the ten with highest id. I expect that quite often it will be the same thing, though. Depending on what you need this for, perhaps this is good enough? On Fri, Mar 4, 2011 at 2:59 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: It would be pretty easy to use an index to do this (keep only the 10 most recent in the index), but you'd need to implement code everywhere you add/delete nodes and relationships, and if you're using an abstraction layer, it wouldn't be possible. In short, it's absolutely possible, but you'll need to implement it in your code. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of mike_t Sent: Friday, March 04, 2011 8:54 AM To: user@lists.neo4j.org Subject: [Neo4j] Load last 10 created nodes from neo4j Hi, is it possible to load for example the last 10 created nodes or relationships from the neo4j db? I know it is not the normal use case for a graph db but however i need a solution. Thanks for your answers, Mike -- View this message in context: http://neo4j-user-list.438527.n3.nabble.com/Load-last-10-created-nodes-from- neo4j-tp2633139p2633139.html Sent from the Neo4J User List mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to copy a complete database?
What would be the consequence of running a background thread that iterated through all nodes and relationships, and if any had a short string property, it would re-save the property? I assume the properties store would get a lot of empty space in the beginning, or would old-id reuse kick in and prevent this? It seems like something can can be done at application level easily enough. And if only some fraction of your properties are really short strings, this should be more efficient than copying the entire database. On Thu, Mar 3, 2011 at 11:52 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: No there is no simpler way, yet. We've been thinking about creating a short string compression tool for accomplishing this, but haven't done so yet. Cheers, Tobias On Thu, Mar 3, 2011 at 11:35 AM, Balazs E. Pataki pat...@dsd.sztaki.hu wrote: Hi, I have a big database based on Neo4J 1.2. Now, if I would like to use the short strings feature of Neo4j 1.3 M03 I should regenerate my full database, that is all strings should be reset so that it may or may not be stored according to the new short strings policy. It seems to me that the easiest way to do this would be to somehow be able to copy the full 1.2 database to a newly created 1.3 M03 database by traversing the 1.2 database. But there maybe a simpler (neo4j builtin) way to do this. Any hints about this? Thanks, --- balazs ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Can anyone compile the latest Neo4J Spatial?
That error is from the PNG exporter, and has never had any side effects. Also that code is not called at all by the test case I am running, which is TestOSMImport. You should see that error only from TestDynamicLayers. Having said that, I have added a fix for it, and will commit that later today or tomorrow. On Sun, Feb 27, 2011 at 12:42 PM, Andreas Kollegger andreas.kolleg...@neotechnology.com wrote: On a fresh and clean Ubuntu VM (after installing java, maven, git, etc), I just cloned neo4j-spatial and tried `mvn clean install`. During the build, I noticed a few these scattered about: Feb 27, 2011 11:58:26 AM org.geotools.map.MapContent finalize SEVERE: Call MapContent dispose() to prevent memory leaks But ended up with a successful build. No failures, no errors. Cheers, Andreas On Feb 27, 2011, at 7:22 AM, Peter Neubauer wrote: Mmmh, the index provider kernel extension subsystem has been changed between 1.3.M01 and M02. I suspect an incompatible kernel version being resolved by maven. let me try to run this tomorrow from home with moving away my current maven repo and get everything fresh. (Sitting on a 3G conenction right now). Hopefully I can tell you tonight, otherwise tomorrow how that works, ok? Also, you could try to move away your ~/.m2/repository for one build and try getting all artifacts fresh from the netz? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Feb 27, 2011 at 2:03 AM, Nolan Darilek no...@thewordnerd.info wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 On 02/26/2011 05:56 PM, Craig Taverner wrote: It is working for me too. One thing that is interesting about the error message is that it says it looks like another instance is running in the *same JVM*. Is that the usual error message? (complete text was this is usually caused by another Neo4j kernel already running in this JVM for this particular store). The error is occurring at the very start of the very first test case in the TestSpatial class, so cannot be due to another test in that class. Still, I would take Peters advice, check no other java test processes are running, manually delete the database to be sure, and then try again. I don't mean to be difficult, but I *literally* did: git clone ... neo4j-spatial cd neo4j-spatial mvn install If I can get more pristine than that then do let me know, but I can't see how. The one process you'll see open in this transcript is a web app. It has nothing to do with Neo4J in anything other than it hosts its jars in its dependencies. The database is not even used at this time and, indeed, the exact same behavior happens if it isn't running. My next question, does someone have a development dependency hanging around in their local m2 repository that I don't? When you've verified that you can build a clean tree, you've first backed up ~/.m2 and removed it? In any case: desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$ ps auxw |grep java nolan12111 1.4 2.8 1423840 109504 pts/1 Sl+ 16:07 2:22 [97;45m [Kjava [m [K -XX:+HeapDumpOnOutOfMemoryError - -XX:+CMSClassUnloadingEnabled -Dsbt.log.noformat=true -jar /home/nolan/bin/sbt-launcher.jar jetty nolan13153 0.0 0.0 7624 896 pts/4S+ 18:55 0:00 grep - --color=auto [97;45m [Kjava [m [K ]0;nolan@nolan-desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$ git clone git://github.com/neo4j/neo4j-spatial Initialized empty Git repository in /home/nolan/src/neo4j/neo4j-spatial/.git/ remote: Counting objects: 3065, done. [K ... Resolving deltas: 100% (1247/1247), done. ]0;nolan@nolan-desktop: ~/src/neo4j nolan@nolan-desktop:~/src/neo4j$ cd neo4j-spatial ]0;nolan@nolan-desktop: ~/src/neo4j/neo4j-spatial nolan@nolan-desktop :~/src/neo4j/neo4j-spatial$ mvn install [INFO] Scanning for projects... [INFO] - [INFO] Building Neo4j Spatial Components [INFO]task-segment: [install] [INFO] - [WARNING] POM for 'org.rrd4j:rrd4j:pom:2.0.6:provided' is invalid. Its dependencies (if any) will NOT be available to the current build. [INFO] [enforcer:enforce {execution: enforce-maven}] [INFO] [license:check {execution: check-licenses}] [INFO] Checking licenses... [INFO] [dependency:unpack-dependencies {execution: get-test-data}] [INFO] Unpacking /home/nolan/.m2/repository/org/neo4j/spatial/osm
Re: [Neo4j] MMap Error on importing large data
Funny you should suggest this. 10 minutes ago I started a new test run that does exactly that. If any test case takes more than 5min to import, I run 3 gc() with 1s sleeps in between. OK, so not your 5s sleep, but let's see if it helps. I also reduced Xmx to 1600M, and reduced the mmap settings by a similar amount, hopefully generally reducing the apps memory consumption somewhat. Let's see what happens this time. On Sun, Feb 27, 2011 at 6:45 PM, Michael Hunger michael.hun...@neotechnology.com wrote: you can try to null the batch inserter and all its external deps that you control add several System.gc() with some thread.sleep(5000) in between that should free the heap you can also output runtimes free memory or even better have a jconsole run concurrently to see heap allocation history (it might even show mmap) Michael Sent from my iBrick4 Am 27.02.2011 um 18:33 schrieb Craig Taverner cr...@amanzi.com: What about the IOException Operation not permitted ? Can you check the access rights on your store? They look fine (644 and 755). Also, it would seem strange for the access rights to change in the middle of a run. The database is being written to continuously for about 5 hours successfully before this error. I also note that I have 20GB free space, so running out of disk space seems unlikely. Having said that, I will do another run with a parallel check for disk space also. While googling I saw that you had a similar problem in November, that Johan answered. From the answer it seems that the kernel adapts its memory usage and segmentation from the store size. So as the store size before the import was zero, probably some of the adjustments that normally take place for such a large store won't be done. I create both the batch inserter and the graph database service with a set configuration, as in the top of the file at https://github.com/neo4j/neo4j-spatial/blob/master/src/test/java/org/neo4j/gis/spatial/Neo4jTestCase.java So your suggestion to run the batch insert in a first VM run and the API work in a second one makes a lot of sense to me, because the kernel is then able to optimize memory usage at startup (if you didn't supply a config file). I will try that tomorrow perhaps. I would need to extract the test code to a place I can use from a console app first. But I noticed also that Mattias thought that two JVM's would not help. Regarding the test-issue. I would really love to have this code elsewhere and just used in the tests, then it could be used by other people too and that would it perhaps also easier to reproduce your problem just with the data file. I can do that. I'm short of time right now, but will see if I can get to that soon. Should be relatively simple to extract to the OSMDataset, so other users can call it. Basically the code traverses both the GIS (layers) views of the OSM data model, and the OSM view (ways, nodes, changesets, users) and produces some statistics on what is found. Could be generally interesting. The one messy part is the code also makes a number of assertions for expected patterns, and this only makes sense in the JUnit test. I would need to save the stats to a map, return that to the junit code so it can make the assertions later. Can you point me to the data file used and attach the test case that you probably modified locally? Then I'd try this at my machine. I've just pushed the code to github. The test class is the TestOSMImport. Currently it skips a test if the test data is missing, and there is only data for two specific test cases in the code base (Billesholm and Malmö). To get it to run the big tests, simply download denmark.osm and/or croatia.osm from downloads.cloudmade.com. At the moment croatia.osm imports fine, at reasonable performance, but denmark.osm is the one giving the problems. Looks like the memory mapped buffer configuration needs to be tweaked. From Johans previous answer, combined with something I read on the wiki, it seems that the batch inserter needs different mmap settings than the normal API. I read that the batch inserter uses the heap for its mmap, while the normal API does not. If I understand correctly, this means that when using the batch inserter, we have to use smaller mmap, otherwise we might fill the heap too soon? In any case, it seems like keeping mmap settings relatively small should avoid this problem, although might not lead to best performance? Have I understood correctly? On Windows heap buffers are used by default and auto configuration will look how much heap is available. Getting out of memory exceptions is an indication that the configuration passed in is using more memory than available heap. I am currently using -Xmx2048 on a 4GB ram machine, 32bit java, and the settings: static { NORMAL_CONFIG.put
[Neo4j] MMap Error on importing large data
Hi, I was importing a reasonably large OSM dataset into Neo4j Spatial, and this involves a batch inserter which imports everything, followed by switching to a normal embedded graph database for adding nodes to the RTree index, which is an in-graph tree structure. The batch inserter phase worked fine, but sometime into the RTree index (normal graph API tree creation), the process terminated and I got the error message in the console: mmap failed for CEN and END part of zip file Since this was running in JUnit, I also got a stack trace in junit console, which I have included below, but the key elements are that it occurred on a line in my code that extracts a double[] property from a node: double[] bbox = (double[]) geomNode.getProperty(bbox); This was certainly not the first time that method was called, and in fact this code has been stable for many months, so I think something deeper inside is going wrong. The stack trace goes further to enter logging, so it seems like it was trying to print a warning, but the logging seems to be trying to load a jar file, which gave a ZipException. I do not know which is more relevant, the 'mmap' error in the console, or the logWarn/ZipFile error in the stack trance. Has anyone seen something like this, or have any ideas how I can trace this further? It took nearly 5 hours to run to this point, so it is not easy to duplicate. Could this in any way be due to issues with memory or the heap versus memory mapping question. Considering that I switch between the batch inserter and the normal API in the same java runtime with the same settings, are there things that I should be taking into account here. Regards, Craig java.lang.InternalError at sun.misc.URLClassPath$JarLoader.*getResource*(Unknown Source) at sun.misc.URLClassPath.getResource(Unknown Source) at java.net.URLClassLoader$1.run(Unknown Source) at java.security.AccessController.doPrivileged(Native Method) at java.net.URLClassLoader.findClass(Unknown Source) at sun.misc.Launcher$ExtClassLoader.findClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at sun.misc.Launcher$AppClassLoader.loadClass(Unknown Source) at java.lang.ClassLoader.loadClass(Unknown Source) at java.util.ResourceBundle$RBClassLoader.loadClass(Unknown Source) at java.util.ResourceBundle$Control.newBundle(Unknown Source) at java.util.ResourceBundle.loadBundle(Unknown Source) at java.util.ResourceBundle.findBundle(Unknown Source) at java.util.ResourceBundle.findBundle(Unknown Source) at java.util.ResourceBundle.getBundleImpl(Unknown Source) at java.util.ResourceBundle.getBundle(Unknown Source) at java.util.logging.Level.getLocalizedName(Unknown Source) at java.util.logging.SimpleFormatter.format(Unknown Source) at java.util.logging.StreamHandler.publish(Unknown Source) at java.util.logging.ConsoleHandler.publish(Unknown Source) at java.util.logging.Logger.log(Unknown Source) at java.util.logging.Logger.doLog(Unknown Source) at java.util.logging.Logger.log(Unknown Source) at java.util.logging.Logger.warning(Unknown Source) at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.*logWarn* (PersistenceWindowPool.java:605) at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.refreshBricks(PersistenceWindowPool.java:505) at org.neo4j.kernel.impl.nioneo.store.PersistenceWindowPool.acquire(PersistenceWindowPool.java:123) at org.neo4j.kernel.impl.nioneo.store.CommonAbstractStore.acquireWindow(CommonAbstractStore.java:490) at org.neo4j.kernel.impl.nioneo.store.AbstractDynamicStore.getLightRecords(AbstractDynamicStore.java:397) at org.neo4j.kernel.impl.nioneo.store.PropertyStore.getRecord(PropertyStore.java:343) at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.propertyGetValue(WriteTransaction.java:1147) at org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.loadPropertyValue(NioNeoDbPersistenceSource.java:407) at org.neo4j.kernel.impl.persistence.PersistenceManager.loadPropertyValue(PersistenceManager.java:79) at org.neo4j.kernel.impl.core.NodeManager.loadPropertyValue(NodeManager.java:572) at org.neo4j.kernel.impl.core.Primitive.getPropertyValue(Primitive.java:538) at org.neo4j.kernel.impl.core.Primitive.getProperty(Primitive.java:158) at org.neo4j.kernel.impl.core.NodeProxy.*getProperty*(NodeProxy.java:134) at org.neo4j.gis.spatial.osm.OSMGeometryEncoder.decodeEnvelope(OSMGeometryEncoder.java:115) at org.neo4j.gis.spatial.RTreeIndex.getEnvelope(RTreeIndex.java:243) at org.neo4j.gis.spatial.RTreeIndex.chooseSubTree(RTreeIndex.java:349) at org.neo4j.gis.spatial.RTreeIndex.add(RTreeIndex.java:76) at org.neo4j.gis.spatial.osm.OSMLayer.addWay(OSMLayer.java:102) at org.neo4j.gis.spatial.osm.OSMImporter.reIndex(OSMImporter.java:212) at org.neo4j.gis.spatial.osm.OSMImporter.reIndex(OSMImporter.java:185) at org.neo4j.gis.spatial.TestOSMImport.loadTestOsmData(TestOSMImport.java:99) at
Re: [Neo4j] MMap Error on importing large data
Hi, I ran the test again with parallel logging of lsof once a minute and ps once every ten minutes. The open file descriptors oscillated around 200, peaking at 218, and at 211 just before the crash, so this seems OK. The memory was around 2.3GB for the entire batch insertion, but went up to nearly 2.8 shortly before the crash (during the normal graph database service phase). I was only checking memory every 10 minutes on a several hour run, so I do not have details from immediately before the crash. I wonder if the batch inserter is not freeing memory, and the normal graph API needs more memory than it can get. Perhaps it is not possible (or sensible) to run them both in the same JVM in series like I do? I redirected the console to a file and this time I got two stack traces, one from JUnit and one in the console file: - The junit stack trace is similar to before, at a different point in my code (a getSingleRelationship), but at the same point in the neo4j code on a logWarn call at line 605 of PersistenceWindowPool.java. It seems from Michaels previous mail, and the memory trace I ran, that this is very much related to running out of memory. - The console stack trace starts with the error *MappedMemException: Unable to map pos=10086537 recordSize=9 totalSize=52443* from MappedPersistenceWindow.java:60. This traces back to the same getSingleRelationship as the other stack trace. The underlying exception is IOException: Operation not permitted from MappedPersistenceWindow.java:52. I have attached both stack traces to this file. To answer Michael's comments about JUnit, the reason I have been running this in JUnit, is that my junit test case has a lot of code that analyses the final graph structure and creates statistics about the graph that are useful for verifying that the OSM import worked and conformed to expected patterns. I use this as part of the standard test suite on small OSM files. The occasional use of this on large OSM files is convenient for me, since I can just uncomment some test case and get very useful statistics from the post-import test code. I could move it to a separate console app, but then I would want to move all the post-import test and verification code to a common place, while really it is primarily for the test cases. Thanks for all the advice so far. Regards, Craig On Sat, Feb 26, 2011 at 3:08 PM, Mattias Persson matt...@neotechnology.comwrote: It may be that too many files are open... there has been some previous mail about batch insertion (refering to lucene index insertion) keeping files open. Could you do an: lsof -n | grep name-of-your-store-dir | wc and see if that returns a high number, 1000 or something? 2011/2/26 Michael Hunger michael.hun...@neotechnology.com: The error occurs here: catch ( OutOfMemoryError e ) { e.printStackTrace(); ooe++; logWarn( Unable to allocate direct buffer ); } And I assume that's why the jdk classloader can't load the stuff needed for localizing Level.WARN. And that results in the error. So what would be much more helpful is the stacktrace from e.printStackTrace() from your console. The one from the zipfile is probably rather from the JDK reading from one of its JARs. So if that error ocurred during creating your RTree, you should be able to reproduce it as the store is still intact after the initial import. You can also zip the store and your code and share it via dropbox. What are your: heap and mmap settings and which version of of jdk, neo4j etc. you were running? We can try to run it on one of our test servers then it should be faster and not bother someones box for so long. Cheers Michael P.S: What bothers me is that you run imports via a JUnit - Test, isn't there OSMImport.main() exactly for that purpose ? Am 26.02.2011 um 12:38 schrieb Craig Taverner: Hi, I was importing a reasonably large OSM dataset into Neo4j Spatial, and this involves a batch inserter which imports everything, followed by switching to a normal embedded graph database for adding nodes to the RTree index, which is an in-graph tree structure. The batch inserter phase worked fine, but sometime into the RTree index (normal graph API tree creation), the process terminated and I got the error message in the console: mmap failed for CEN and END part of zip file Since this was running in JUnit, I also got a stack trace in junit console, which I have included below, but the key elements are that it occurred on a line in my code that extracts a double[] property from a node: double[] bbox = (double[]) geomNode.getProperty(bbox); This was certainly not the first time that method was called, and in fact this code has been stable for many months, so I think something deeper inside is going wrong. The stack trace goes further
Re: [Neo4j] Can anyone compile the latest Neo4J Spatial?
It is working for me too. One thing that is interesting about the error message is that it says it looks like another instance is running in the *same JVM*. Is that the usual error message? (complete text was this is usually caused by another Neo4j kernel already running in this JVM for this particular store). The error is occurring at the very start of the very first test case in the TestSpatial class, so cannot be due to another test in that class. Still, I would take Peters advice, check no other java test processes are running, manually delete the database to be sure, and then try again. On Sat, Feb 26, 2011 at 9:30 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Nolan, I am running GIT fresh tests without problems. Are you having some old Java process running? Seems Neo4j refuses to start because it can't lock the files ... Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/- Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sat, Feb 26, 2011 at 8:26 PM, Nolan Darilek no...@thewordnerd.info wrote: -BEGIN PGP SIGNED MESSAGE- Hash: SHA1 Wondering if something didn't get committed somewhere? I tried both with an update as well as with a fresh checkout and am getting a ton of test errors, not failures. Here's a sample Surefire output. This was gotten from a fresh checkout running mvn install: - --- Test set: org.neo4j.gis.spatial.TestSpatial - --- Tests run: 10, Failures: 0, Errors: 10, Skipped: 0, Time elapsed: 1.558 sec FAILURE! Test Import of billesholm.osm(org.neo4j.gis.spatial.TestSpatial$1) Time elapsed: 1.513 sec ERROR! java.lang.AbstractMethodError at org.neo4j.kernel.KernelExtension$KernelData.loadAll(KernelExtension.java:178) at org.neo4j.kernel.EmbeddedGraphDbImpl$2.load(EmbeddedGraphDbImpl.java:164) at org.neo4j.kernel.EmbeddedGraphDbImpl.init(EmbeddedGraphDbImpl.java:169) at org.neo4j.kernel.EmbeddedGraphDatabase.init(EmbeddedGraphDatabase.java:80) at org.neo4j.gis.spatial.Neo4jTestCase.reActivateDatabase(Neo4jTestCase.java:123) at org.neo4j.gis.spatial.Neo4jTestCase.setUp(Neo4jTestCase.java:88) at org.neo4j.gis.spatial.TestSpatial.setUp(TestSpatial.java:255) at junit.framework.TestCase.runBare(TestCase.java:132) at junit.framework.TestResult$1.protect(TestResult.java:110) at junit.framework.TestResult.runProtected(TestResult.java:128) at junit.framework.TestResult.run(TestResult.java:113) at junit.framework.TestCase.run(TestCase.java:124) at junit.framework.TestSuite.runTest(TestSuite.java:243) at junit.framework.TestSuite.run(TestSuite.java:238) at org.junit.internal.runners.JUnit38ClassRunner.run(JUnit38ClassRunner.java:83) at org.apache.maven.surefire.junit4.JUnit4TestSet.execute(JUnit4TestSet.java:59) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.executeTestSet(AbstractDirectoryTestSuite.java:120) at org.apache.maven.surefire.suite.AbstractDirectoryTestSuite.execute(AbstractDirectoryTestSuite.java:103) at org.apache.maven.surefire.Surefire.run(Surefire.java:169) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:57) at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) at java.lang.reflect.Method.invoke(Method.java:616) at org.apache.maven.surefire.booter.SurefireBooter.runSuitesInProcess(SurefireBooter.java:350) at org.apache.maven.surefire.booter.SurefireBooter.main(SurefireBooter.java:1021) Test Spatial Index on billesholm.osm(org.neo4j.gis.spatial.TestSpatial$2) Time elapsed: 0.003 sec ERROR! org.neo4j.graphdb.TransactionFailureException: Could not create data source [nioneodb], see nested exception for cause of error at org.neo4j.kernel.impl.transaction.TxModule.registerDataSource(TxModule.java:153) at org.neo4j.kernel.GraphDbInstance.start(GraphDbInstance.java:106) at org.neo4j.kernel.EmbeddedGraphDbImpl.init(EmbeddedGraphDbImpl.java:167) at org.neo4j.kernel.EmbeddedGraphDatabase.init(EmbeddedGraphDatabase.java:80) at org.neo4j.gis.spatial.Neo4jTestCase.reActivateDatabase(Neo4jTestCase.java:123) at