[Neo4j] zero fromDepth and toDepth
Hi everybody when setting fromDepth and toDepth both at zero, like in the following code Traversal.description.breadthFirst .evaluator(Evaluators.fromDepth(0)) .evaluator(Evaluators.toDepth(0)) I'm expecting to get only the start node, but I don't. Am I missing anything? Thanks! Cheers Alex -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/zero-fromDepth-and-toDepth-tp3474825p3474825.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] zero fromDepth and toDepth
Hi Peter it admittedly makes little sense to use fromDepth(0) toDepth(0) because there's obviously no need to run the query at all. Anyway, I'd expect a behavior consistent with, for example fromDepth(1) toDepth(1), which returns only nodes at depth 1 (if I'm not mistaken). So, I'd definitely modify the code for the sake of consistency. For the same reason, atDepth(0) should also return the start node. Cheers Alex -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/zero-fromDepth-and-toDepth-tp3474825p3476040.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] zero fromDepth and toDepth
That sounds a bit bizarre: in my code, fromDepth(n) toDepth(n) seems to be working like atDepth(n) if n0 (that's what should be happening, isn't it?) Alex -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/zero-fromDepth-and-toDepth-tp3474825p3476058.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] zero fromDepth and toDepth
Done: http://neo4jdb.lighthouseapp.com/projects/77609-neo4j-community/tickets/17-consisnte-behavior-of-fromdepth-todepth-and-atdepth there's a typo in the title... time to get some sleep :) Alex -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/zero-fromDepth-and-toDepth-tp3474825p3476080.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Scala, SBT and Neo4J
Dear Neo4Jers I am experiencing some issues with the Scala (2.9.0-1) sbt (0.10.1) dependency control and Neo4J (1.4.1), and namely I observed the following facts: 1- When using the sbt line org.neo4j % neo4j % 1.4.1 neo4j-kernel is not fetched or/and is not found, which results in the following error object graphdb is not a member of package org.neo4j [error] import org.neo4j.graphdb._ 2- Removing org.neo4j % neo4j % 1.4.1 and including org.neo4j % neo4j-kernel % 1.4.1 works, in the sense that the package org.neo4j.graphdb is on the classpath 3- Including both org.neo4j % neo4j-kernel % 1.4.1 and org.neo4j.app % neo4j-server % 1.4.1 or org.neo4j % neo4j-kernel % 1.4.1 and org.neo4j % neo4j % 1.4.1 results once again in the same error message as in point 1. Am I missing anything? I need neo4j-server to use WrappingNeoServerBootstrapper, but it seems that I cannot have all dependencies at the same time. What's the problem? Thank you. Cheers Alex -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Scala-SBT-and-Neo4J-tp3277895p3277895.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Problem starting up Neo4J on Red Hat 5
Hello, I've just downloaded the latest stable Neo4J community edition (v1.3) and I'm having some problems starting it up. If I run /bin/neo4j start from the command line I get: $ sudo bin/neo4j start Starting Neo4j Server... Waiting for Neo4j ServerException in thread main java.lang.NoSuchMethodError: method java.lang.Class.getCanonicalName with signature ()Ljava.lang.String; was not found. at org.rzo.yajsw.boot.WrapperLoader.getWrapperJar(WrapperLoader.java:44) at org.rzo.yajsw.boot.WrapperLoader.getWrapperClasspath(WrapperLoader.java:147) at org.rzo.yajsw.boot.WrapperLoader.getWrapperClassLoader(WrapperLoader.java:206) at org.rzo.yajsw.boot.WrapperExeBooter.main(WrapperExeBooter.java:28) .. WARNING: Neo4j Server may have failed to start. I've also tried with the 1.4m4 release and I had the same problem. Can anyone help me out please? Thank you Alex Bilbie Online Services Team ICT Services University of Lincoln t: 01522 886542 e: abil...@lincoln.ac.uk ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Neo4j Neo Technology logos in vector format?
Hey, As subject, is there anywhere I can get Neo4j or Neo Technology logos in vector image format (ps, eps, pdf, svg)? I'll be using them in a few posters at work, for the SICS open day. They will be advertising past and ongoing Neo Technology related research. A lot of people from industry will participate, so it's a opportunity to advertise. Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Neo4j Neo Technology logos in vector format?
Thanks Will! On Thu, Apr 7, 2011 at 6:08 PM, Will Holcomb w...@dhappy.org wrote: I did this a bit ago and it might help start you off: http://will.tip.dhappy.org/image/logo/Neo4j/ On Thu, Apr 7, 2011 at 11:59 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, As subject, is there anywhere I can get Neo4j or Neo Technology logos in vector image format (ps, eps, pdf, svg)? I'll be using them in a few posters at work, for the SICS open day. They will be advertising past and ongoing Neo Technology related research. A lot of people from industry will participate, so it's a opportunity to advertise. Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] How to copy a complete database?
You could use GraphMigrator from Blueprints: https://github.com/tinkerpop/blueprints/blob/master/src/main/java/com/tinkerpop/blueprints/pgm/util/graphml/GraphMigrator.java Check out *testMigratingTinkerGraphExample3()* in this test case for usage: https://github.com/tinkerpop/blueprints/blob/master/src/test/java/com/tinkerpop/blueprints/pgm/util/graphml/GraphMLReaderTestSuite.java Basically: Neo4jGraph oldGraph = new Neo4jGraph(oldDirectory); Neo4jGraph newGraph = new Neo4jGraph(newDirectory); GraphMigrator.migrateGraph(oldGraph, newGraph); On Thu, Mar 3, 2011 at 12:25 PM, Axel Morgner a...@morgner.de wrote: +1 On 03.03.2011 12:25, Peter Neubauer wrote: Mmh, maybe this could be in some Utilities class? Seems a good thing to be able to clone a graph ... ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Forums ?
regarding google groups: (1) you can use it as a mailing list. with google groups you can choose to use the web interface, or just reply to emails... so you would lose nothing from what's currently being used. (2) re problems with depending on a centralized point of control like google: this may sound naive but, google's never gonna go down... also, 640kb is more memory than anyone will ever need On Mon, Feb 28, 2011 at 10:43 AM, Axel Morgner a...@morgner.de wrote: Personally, I love mailing lists for their discussion quality (at least most people seem to think think twice before posting). But as the neo4j community is obvisously growing fast, it makes sense to offer other channels. Would it be an option to keep and dedicate the mailing list to development or technical discussions and offer a forum to users with general topics? To Google groups: While it maybe meets the demands now, this part of the neo4j community would be dependend on Google in future. Personally, I prefer more decentralized and self-controlled services ... Just my 2 cents.. Axel On 27.02.2011 08:30, Emilio Dabdoub wrote: Im almost sure that somebody asked this question before, but I didt not found a way to search the mail list :) Why neo4j does not have a Discussion forum? Im sure collaboration would boost ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Forums ?
google groups please! On Sun, Feb 27, 2011 at 1:12 PM, Andreas Kollegger andreas.kolleg...@neotechnology.com wrote: Hi Emilio, That has come up in conversations before, but we've stuck with just using the mailing list. While not a forum, google-groups is a good compromise because it is nicely searchable, web or email accessible and easy to manage. What does the list think? Is it time to to move on from beloved mailman? Perhaps to a Facebook group? (Kidding, I'm kidding) -Andreas On Feb 27, 2011, at 8:30 AM, Emilio Dabdoub wrote: Im almost sure that somebody asked this question before, but I didt not found a way to search the mail list :) Why neo4j does not have a Discussion forum? Im sure collaboration would boost ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Forums ?
besides the lack of edit (which isn't so important) what does a forum have the google groups doesn't? ps, google does no evil On Sun, Feb 27, 2011 at 4:13 PM, Cedric Hurst ced...@spantree.net wrote: +1 for google groups as well. Its a shame that google groups doesn't offer forums yet (or maybe they do and I just don't know about it). On Sun, Feb 27, 2011 at 7:55 AM, Andreas Kollegger andreas.kolleg...@neotechnology.com wrote: That's true. It is possible to search the mailing list. Do you prefer those interfaces to using google groups? -Andreas On Feb 27, 2011, at 1:52 PM, Anders Nawroth wrote: Hi! As stated here: http://neo4j.org/community/list/ you can search the mailing list archives here: http://www.mail-archive.com/user@lists.neo4j.org/info.html or here: http://www.listware.net/list-neo4j-user.html /anders 2011-02-27 08:30, Emilio Dabdoub skrev: Im almost sure that somebody asked this question before, but I didt not found a way to search the mail list :) Why neo4j does not have a Discussion forum? Im sure collaboration would boost ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Cache sharding blog post
Cache sharding = super nice iterative/interim improvement. It makes use of aggregate resources (RAM CPU) across multiple servers (as would be the case with a truly sharded Neo4j) without bothering about partitioning (even basic consistent hashing) algorithms. You get a viable partitioning solution by moving the logic to the client-side. Neo4j doesn't need to worry about some of the engineering headaches that true partitioning would bring - like performing transactions between multiple different partitions, or load balancing partition replicas across available servers - but clients still get some of the benefits of true partitioning. Like! Some other thoughts... As mentioned, you don't get around the difficult partitioning problem: achieving locality in a big graph of unknown topology/access. Also, you don't make the best use of aggregate resources. For example, if you wanted to run one very large (read: forbidden/not-graphy) traversal like page-rank on the entire graph you (1) would hit disk (2) would not make use of aggregate resources (CPU,RAM). I think that's an orthogonal problem though... regardless of how you cut your graph, the way you then traverse it has to be resource-aware: it has to know about, and be capable of using, the compute resources on other machines. E.g. Neo4j graph walking vs Pregel vertex gossiping... but maybe that can be left to future discussions. On Thu, Feb 24, 2011 at 3:26 PM, Mark Harwood markharw...@gmail.com wrote: That's a really fantastic and useful design metric. Can paraphrase it a bit and write it up on the Neo4j blog/my blog? I'd be honoured. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Flushing Cache
Hey, I want to flush the Neo4j cache (for benchmarking reasons). If I shutdown the database and then reopen it will it do the trick, or is there something else I need to do? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Flushing Cache
Hi Tobias, This is through 101 levels of abstraction (tinkerpop stack) so the Open/Close method will do as a first step. Thanks, Alex On Mon, Dec 27, 2010 at 2:56 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Shutdown and reopen will work. If you also want to flush the operating systems file system cache you will have to go dd some other (large) file. If all you want to do is clear the internal object cache you can do this through the Neo4j Cache management bean: EmbeddedGraphDatabase graphDb = ... org.neo4j.management.Cache cache = graphDb.getManagementBean(org.neo4j.management.Cache.class); cache.clear(); Cheers, Tobias On Mon, Dec 27, 2010 at 2:13 PM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, I want to flush the Neo4j cache (for benchmarking reasons). If I shutdown the database and then reopen it will it do the trick, or is there something else I need to do? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [SPAM] Re: [SPAM] Re: [SPAM] Re: Reference node pains.
Given that Neo4j has a pretty powerful indexing system with Lucene, why can't users create their own reference node(s) and index them in their application? Like (in pseudo): Graph g = new Graph() Node n = g.newNode() g.putIndex(n,reference) //later... Node refNode = g.getIndex(reference) I've used Neo4j a lot less than any of you so maybe I don't appreciate something here, but to me this reference node concept does seem like an artifact that provides little added value, and was a source of frustration during my thesis. Just my 2 cents. Cheers, Alex On Wed, Dec 15, 2010 at 5:31 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Thought about that too, and while it's always node zero today, but who knows what happens in some future rev with sharding, etc...I'd prefer it to be opaque to the how. -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Wednesday, December 15, 2010 11:26 AM To: Neo4j user discussions Subject: [SPAM] Re: [Neo4j] [SPAM] Re: [SPAM] Re: Reference node pains. Hi, One reason: how to you obtain that reference node later? Seems to me you'd need to write some code to save the node id, index it, etc... If the reference node (=the first node created) is always Vertex Id 0 as Johan stated in a previous email as being the case, then you simply do: Graph.getNodeById(0); You can, of course, create your own method: public Node getReferenceNode() { return graph.getNodeById(0); } I don't understand why, for those that don't want a reference node, simply don't call getReferenceNode() (assuming the lazy creation logic is added). ;-) ...assuming lazy creation logic. (which is smart). Another argument could be the inverse of my previous email: // I like the concept of a reference node Graph graph = new Neo4j(); And for those that don't: // I don't like the concept of a reference node Graph graph = new Neo4j(); graph.removeNode(graph.getReferenceNode() || graph.getNodeById(0)) See ya, Marko. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [SPAM] Re: [SPAM] Re: [SPAM] Re: Reference node pains.
1) I mentioned Lucene in my last comment but it doesn't need to be Lucene... just an IndexService implementation. 2) Are the performance requirements of looking up a reference node any different from those of looking up some other node? If it's a common operation it will be cached, if it's not a common operation it's probably not very important. On Fri, Dec 17, 2010 at 2:15 PM, Craig Taverner cr...@amanzi.com wrote: I think that the idea proposed by Emil, with multiple named reference nodes, is similar, and implies the use of an index to find the reference by name. However, I can also see that use of the lucene index for this implies that lucene, which is a 3rd party component, becomes a compulsory part of the neo4j-kernel, which might not be ideal. The other alternative is to use similar name lookup schemes that Neo4j already has hard-coded into the relationship type index and the property name index. As with relationship types, it probably makes sense to assume a limited number of possible reference nodes. I think relationship types are limited to 64k? Seems like a reasonable limit for reference nodes too. A question I have for the kernel guys, how does the current name lookup perform? Is it as fast as lucene, does it scale well (ie. work as fast if you use 10 relationship type or 10k relationship types)? Does it simply load the entire table into a big java hashmap? P.S. 64k ought to be enough for anyone. :-) (don't quote me http://www.computerworld.com/s/article/9101699/The_640K_quote_won_t_go_away_but_did_Gates_really_say_it_ ...) On Fri, Dec 17, 2010 at 12:29 PM, Alex Averbuch alex.averb...@gmail.com wrote: Given that Neo4j has a pretty powerful indexing system with Lucene, why can't users create their own reference node(s) and index them in their application? Like (in pseudo): Graph g = new Graph() Node n = g.newNode() g.putIndex(n,reference) //later... Node refNode = g.getIndex(reference) I've used Neo4j a lot less than any of you so maybe I don't appreciate something here, but to me this reference node concept does seem like an artifact that provides little added value, and was a source of frustration during my thesis. Just my 2 cents. Cheers, Alex On Wed, Dec 15, 2010 at 5:31 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Thought about that too, and while it's always node zero today, but who knows what happens in some future rev with sharding, etc...I'd prefer it to be opaque to the how. -Original Message- From: user-boun...@lists.neo4j.org [mailto: user-boun...@lists.neo4j.org] On Behalf Of Marko Rodriguez Sent: Wednesday, December 15, 2010 11:26 AM To: Neo4j user discussions Subject: [SPAM] Re: [Neo4j] [SPAM] Re: [SPAM] Re: Reference node pains. Hi, One reason: how to you obtain that reference node later? Seems to me you'd need to write some code to save the node id, index it, etc... If the reference node (=the first node created) is always Vertex Id 0 as Johan stated in a previous email as being the case, then you simply do: Graph.getNodeById(0); You can, of course, create your own method: public Node getReferenceNode() { return graph.getNodeById(0); } I don't understand why, for those that don't want a reference node, simply don't call getReferenceNode() (assuming the lazy creation logic is added). ;-) ...assuming lazy creation logic. (which is smart). Another argument could be the inverse of my previous email: // I like the concept of a reference node Graph graph = new Neo4j(); And for those that don't: // I don't like the concept of a reference node Graph graph = new Neo4j(); graph.removeNode(graph.getReferenceNode() || graph.getNodeById(0)) See ya, Marko. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Performance benchmarks of REST API vs embedded Neo4J
Hi Rick, You could get a ballpark comparison by using Blueprintshttps://github.com/tinkerpop/blueprints, and comparing the performance of Neo4jGraphhttps://github.com/tinkerpop/blueprints/tree/master/src/main/java/com/tinkerpop/blueprints/pgm/impls/neo4j/ with the performance of RexsterGraphhttps://github.com/tinkerpop/blueprints/blob/master/src/main/java/com/tinkerpop/blueprints/pgm/impls/rexster/RexsterGraph.java . Rexster https://github.com/tinkerpop/rexster is a RESTful graph shell that exposes any Blueprints graph through a standalone HTTP server Using Neo4jGraph is straight forward. For a discussion on using RexsterGraph go herehttp://groups.google.com/group/gremlin-users/browse_thread/thread/aa9ad740c8fed156/9a7510a3e36f1d72?lnk=gstq=rexster#9a7510a3e36f1d72 . If you know the type of graph operations that you would like to test, GraphDB-Bench https://github.com/tinkerpop/graphdb-bench can be used to compare performance of the two Graph implementations (Embedded vs REST). What type of graph operations would you like to test? I might be able to help by setting up the specific GraphDB-Bench benchmark for you. Cheers, Alex On Sat, Dec 4, 2010 at 3:22 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Has anyone done any rough performance comparisons of the two approaches? I have to think we're looking at 1 or 2 orders of magnitude difference, but would like to know if there is any hard data yet. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Using JUNG framework over Neo4j.
Real nice! Could you add a small example of using the visualization support in Jung, e.g. To produce the pic on the page? +1 On Sat, Nov 13, 2010 at 9:17 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Very cool Marko, Could you add a small example of using the visualization support in Jung, e.g. To produce the pic on the page? /Peter On Friday, November 12, 2010, Marko Rodriguez okramma...@gmail.com wrote: Hi, I thought many of you might be interested in using JUNG over Neo4j. http://github.com/tinkerpop/blueprints/wiki/JUNG-Graph-Implementation This support has been in Blueprints for many months now, I just never documented it. Take care, Marko. http://markorodriguez.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] LuceneIndexService: NoSuchMethodError
As it turns out, this didn't resolve my problem, but actually broke the other part of the application that was using Lucene. The error is as follows: Caused by: java.lang.NoSuchMethodError: org.apache.lucene.search.IndexSearcher.search(Lorg/apache/lucene/search/Query;)Lorg/apache/lucene/search/Hits; In a previous message to this list, it was said that the Hits class, which causes the NoSuchMethod error, was copied into the index-core artifact, so it should be compatible with Lucene 3.0. Is there a way to make sure that the VM resolves to the included version of Hits rather than looking for it in the Lucene library where it no longer exists? So far the only solutions I've seen are to revert to Lucene 2.9.2, which unfortunately I don't have the flexibility to do. Is there another way to deal with this? I'd rather not have to edit the indexing component and build from source, but it seems like it might come to this. Thanks, Alex On Fri, Aug 6, 2010 at 12:52 PM, Alex D'Amour adam...@iq.harvard.edu wrote: For future reference, if anybody else is in the situation I was in, once solution is to package up your library with dependency classes rolled up into the jar file. The following added to the POM.xml under plugins accomplishes this: plugin artifactIdmaven-assembly-plugin/artifactId executions execution phasepackage/phase goals goalattached/goal /goals /execution /executions configuration descriptorRefs descriptorRefjar-with-dependencies/descriptorRef /descriptorRefs /configuration /plugin Alex On Thu, Aug 5, 2010 at 8:25 PM, Alex D'Amour adam...@iq.harvard.edu wrote: Hi all, I'm getting this same error in an environment where a class I implemented using Neo4j and the LuceneIndexService is called by an application that's using Lucene 3.0. The application server is unfortunately above my abstraction layer (I'm just implementing the back end in neo4j), so I can't change the version of lucene that it's including. Previous messages have suggested that the indexing component should work using Lucene 3.0, but is there an easy way for me to remvoe this version conflict without manually editing the pom.xml in the indexing component source? Thanks, Alex On Mon, Aug 2, 2010 at 11:07 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, since you are using neo4j-index, you should not be importing Lucene again, since it already is a dependency of the index components (and I think the version is higher there). So, upgrading to 1.1 and removing the Lucene dependency should fix it: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1/version /dependency Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 3:11 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi Peter, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? That would make sense to me as well. But like I said, on the first run, the method is found. Running the exact same code a second time, without any changes, it complains that the method is not found. (?!) Here the versions I use (for both runs) from my pom.xml: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version2.9.1/version /dependency Cheers, Max On Mon, Aug 2, 2010 at 3:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 2:38 PM
Re: [Neo4j] LuceneIndexService: NoSuchMethodError
For future reference, if anybody else is in the situation I was in, once solution is to package up your library with dependency classes rolled up into the jar file. The following added to the POM.xml under plugins accomplishes this: plugin artifactIdmaven-assembly-plugin/artifactId executions execution phasepackage/phase goals goalattached/goal /goals /execution /executions configuration descriptorRefs descriptorRefjar-with-dependencies/descriptorRef /descriptorRefs /configuration /plugin Alex On Thu, Aug 5, 2010 at 8:25 PM, Alex D'Amour adam...@iq.harvard.edu wrote: Hi all, I'm getting this same error in an environment where a class I implemented using Neo4j and the LuceneIndexService is called by an application that's using Lucene 3.0. The application server is unfortunately above my abstraction layer (I'm just implementing the back end in neo4j), so I can't change the version of lucene that it's including. Previous messages have suggested that the indexing component should work using Lucene 3.0, but is there an easy way for me to remvoe this version conflict without manually editing the pom.xml in the indexing component source? Thanks, Alex On Mon, Aug 2, 2010 at 11:07 AM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, since you are using neo4j-index, you should not be importing Lucene again, since it already is a dependency of the index components (and I think the version is higher there). So, upgrading to 1.1 and removing the Lucene dependency should fix it: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1/version /dependency Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 3:11 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi Peter, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? That would make sense to me as well. But like I said, on the first run, the method is found. Running the exact same code a second time, without any changes, it complains that the method is not found. (?!) Here the versions I use (for both runs) from my pom.xml: dependency groupIdorg.neo4j/groupId artifactIdneo4j-kernel/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.neo4j/groupId artifactIdneo4j-index/artifactId version1.1-SNAPSHOT/version /dependency dependency groupIdorg.apache.lucene/groupId artifactIdlucene-highlighter/artifactId version2.9.1/version /dependency Cheers, Max On Mon, Aug 2, 2010 at 3:01 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Max, this sounds like a version clash on Lucene. Can you check what version(s) of Lucene (and Neo4j-Index) you are running in the two scenarios? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Mon, Aug 2, 2010 at 2:38 PM, Max Jakob max.ja...@fu-berlin.de wrote: Hi, I have a problem with the LuceneIndexService. When I create an indexed graph base and I commit it to disk, next time I want to use it, I get a NoSuchMethodError for LuceneIndexService.getSingleNode: Exception in thread main java.lang.NoSuchMethodError: org.apache.lucene.search.IndexSearcher.search(Lorg/apache/lucene/search/Query;)Lorg/apache/lucene/search/Hits; at org.neo4j.index.lucene.LuceneIndexService.searchForNodes(LuceneIndexService.java:430) at org.neo4j.index.lucene.LuceneIndexService.getNodes(LuceneIndexService.java:310) at org.neo4j.index.lucene.LuceneIndexService.getSingleNode(LuceneIndexService.java:469) at org.neo4j.index.lucene.LuceneIndexService.getSingleNode(LuceneIndexService.java:461) To illustrate this in more detail: if I run the code below for the first time, everything goes fine. On a second run I get the exception. Could somebody give me a hint where I'm going wrong? (re-indexing does not work
[Neo4j] Attributes or Relationship Check During Traversal
Hello all, I have a question regarding traversals over a large graph when that traversal depends on a discretely valued attribute of the nodes being traversed. As a small example, the nodes in my graph can have 2 states -- on and off. I'd like to traverse over paths that only consist of active nodes. Since this state attributes can only take 2 values, I see two possible approaches to implementing this: 1) Use node properties, and have the PruneEvaluator and filter Predicate check to see whether the current endNode has a property called on. 2) Create a state node which represents the on state. Have all nodes that are in the on state have a relationship of type STATE_ON incoming from the on node. Have the PruneEvaluator and filter Predicate check whether the node has a single relationship of type STATE_ON, INCOMING. Which is closer to what we might consider best practices for Neo4j? The problem I see in implementation 1 is that that traversal has to hit the property store, which could slow things down. The problem with 2 is that there can be up to #nodes relationships coming from the on state node, and making this more efficient by setting up a tree of on state nodes seems to be manually replicating something that the indexing service has already accomplished. Also, how efficiently would each of these two implementations exploit caching (or is this irrelevant?)? Finally, would your answer change if we generalized this to a larger number of categories? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Stability of Iterators Across Transactions?
Hello all, I have an application where I have a node that has several hundred thousand relationships (this probably needs to be changed). In the application I iterate over these relationships, and delete a large subset of them. Because there are so many writes, I want to commit the transaction every few thousand deletions. The problem is that the getAllRelationships iterator seems to halt after the first transaction commit. Clearly, I should reduce the number of relationships that are connected to this node, but is this the expected behavior? Should iterators be made stable across transactions, or are they only supposed to be guaranteed within a transaction? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Question about labelling all connected components
One other option is to have a set of nodes, each of which represents a component. You can create a relationships of type OWNS (or whatever) to each of the nodes of a given component. This makes component lookup rather simple (just grab the node that represents the component, then traverse all of the OWNS relationships), and it makes merging components rather simple if you end up having two components that get linked to each other while the graph is evolving (transfer all of the relationships from one to the other). On Sat, Jul 24, 2010 at 1:10 PM, Mattias Persson matt...@neotechnology.comwrote: 2010/7/23 Arijit Mukherjee ariji...@gmail.com Thanx to both of you. Yes, I can just check whether the label exists on the node or not. In my case checking for Integer.MIN_VALUE which is what is assigned when the subscriber node is created. To assign a temporary value (or a value representing the state not assigned) seems unecessary. A better way would be to not set that property on creating a node and then use: node.getProperty( whatever key, Integer.MIN_VALUE ); when getting that property. BTW - is it ever possible to label the components while creating the graph? I can't think of any way of doing this - but I might be missing something... Regards Arijit On 22 July 2010 20:54, Vitor De Mario vitordema...@gmail.com wrote: As far as the algorithm goes, I see nothing wrong. Connected components is a well known problem in graph theory, and you're doing just fine. I second the recommendations of Tobias, specially the second one, as you would get rid of the labelled collection completely, and that improves you both in time and memory. []'s Vitor On Thu, Jul 22, 2010 at 11:35 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: The first obvious thing is that labelled.contains(currentNode.getId()) is going to take more time as your dataset grows, since it's a linear search for the element in an ArrayList. A HashSet would be a much more appropriate data structure for your application. The other thing that comes to mind is the memory overhead of the labelled-collection. Eventually it is going to contain every node in the graph, and be very large. This steals some of the memory that could have been used for caching the graph, forcing Neo4j to do more I/O than it would have if it could have used that memory for cache. Would it be possible for you to replace the !labelled.contains(currentNode.getId())-check with currentNode.getProperty(componentID,null) == null? Or are there situations where the node could have that property and not be considered labeled? Cheers, Tobias On Thu, Jul 22, 2010 at 3:35 PM, Arijit Mukherjee ariji...@gmail.com wrote: Hi All I'm trying to label all connected components in a graph - i.e. all nodes that are connected will have a common componentID property set. I'm using the Traverser to do this. For each node in the graph (unless it is already labelled, which I track by inserting the node ID in a list), the traverser finds out all the neighbours using BFS, and then the node and all the neighbours are labelled with a certain value. The code is something like this - IterableNode allNodes = graphDbService.getAllNodes(); ArrayList labelled = new ArrayList(); for (Node currentNode : allNodes) { if (currentNode.hasProperty(number) !labelled.contains(currentNode.getId())) { Traverser traverser = currentNode.traverse(Order.BREADTH_FIRST, StopEvaluator.END_OF_GRAPH, ReturnableEvaluator.ALL_BUT_START_NODE, RelTypes.CALLS, Direction.BOTH); int currentID = initialID; initialID++; currentNode.setProperty(componentID, currentID); labelled.add(currentNode.getId()); for (Node friend : traverser) { friend.setProperty(componentID, currentID); // mark each node as labelled labelled.add(friend.getId()); } } } This works well for a small graph (2000 nodes). But for a graph of about 1 million nodes, this is taking about 45 minutes on a 64-bit Intel 2.3GHz CPU, 4GB RAM (Java 1.6 update 21 and Neo4J 1.0). Is this normal? Or is the code I'm using faulty? Is there any other way to label the connected components? Regards Arijit -- And when the night is cloudy, There is still a light that shines on me, Shine on until tomorrow, let it be. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] neo4j question
Hi Paddy, Just for your information, I've been running a lot of shortest path operations on a graph of similar size to yours (approx 1 million Nodes 5 Million Relationships). I'm using an Amazon instance with 32GB of RAM, but am only using a heap size of 8GB (as other people are sharing the same instance). I choose the start and end Nodes randomly, so occasionally the AStar algorithm actually exhausts the entire graph during a search. What I've found is that the first 1-3 search operations may take up to a minute in the worse case (when entire graph is exhausted) but after this I never see run times of over 10 seconds (entire graph exhausted), and in most cases it completes in between 0-4 seconds Cheers, Alex On Mon, Jul 5, 2010 at 5:05 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Paddy, posting on the user list - yes, I think you should add RAM to hold your graph in memory, and then maybe write some routing to warm up the caches. That is, read all nodes up into RAM before you do the first search, or e.g. perform some interesting searches to load all interesting data. Would that be possible? What hardware config are you running on right now? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Sun, Jul 4, 2010 at 4:34 AM, Paddy paddyf...@gmail.com wrote: hi peter, Hope all is well, I have modeled the bus network with timetables and stops connecting by walking distance. There are approx 10 million and 1 million nodes. the graph is approx 1gb Running on my local machine, to find the best route , it takes 5 seconds to do approx 100 lucene query searches and add approx 100 relationships to the graph it can take from 3 - 10 seconds to find a shortest path , with the a star algorithm. what steps could i take to speed up the app? More RAM? Run in the cloud? Thanks Paddy ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] New tentative API in trunk: Expander/Expansion
Hi Tobias, It seems as though the new changes have broken the AStar code I'm using. I use: neo4j-apoc 1.1-SNAPSHOT neo4j-graph-algo 0.5-SNAPSHOT AStar uses DefaultExpander and can no longer find it. Here's an example of the code that worked until now. DefaultExpander relExpander = new DefaultExpander(); relExpander.add(GISRelationshipTypes.BICYCLE_WAY, Direction.BOTH); AStar sp = new AStar(graphDb, relExpander, costEval, estimateEval); Path path = sp.findSinglePath(startNode, endNode); The problem seems to be that AStar wants a RelationshipExpander but now I can only create an ExpansionRelationship. Do you have any suggestions as to how to make this work again? Regards, Alex On Wed, Jun 23, 2010 at 11:14 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Hi Neo4j enthusiasts! Yesterday I committed an API that Mattias and I have been working on for a few days. It comes in the form of two new interfaces Expander and Expansion (in the org.neo4j.graphdb package), and four new methods in the Node interface (method names starting with expand). The two main problems this API solves are: 1. Adds a clean way for getting related nodes from a source node. 2. Adds a type safe way for declaratively specifying any combination of RelationshipTypes and Directions to expand. This replaces what was actually an anti-pattern, but something we saw people doing a lot: using a depth-one traversal to get the nodes related to a node, without needing to bother with the Relationship interface. Example: // The most convenient way to write it in the past: Node source = ... Traverser traverser = source.traverse( Traverser.Order.DEPTH_FIRST, new StopAtDepth( 1 ), ReturnableEvaluator.ALL_BUT_START_NODE, TYPE_ONE, Direction.INCOMING, TYPE_TWO, Direction.OUTGOING); for (Node related : traverser) { doSomethingWith( related ); } // The previously recommended (and bloated) way of doing it: Node source = ... for (Relationship rel : source.getRelationships( TYPE_ONE, TYPE_TWO )) { Node related; if (rel.isType(TYPE_ONE)) { related = rel.getStartNode(); if (related.equals(source)) continue; // we only want INCOMING TYPE_ONE } else if (rel.isType(TYPE_TWO)) { related = rel.getEndNode(); if (related.equals(source)) continue; // we only want OUTGOING TYPE_TWO } else { continue; // should never happen, but makes javac know that related is != null } doSomethingWith( related ); } // With the new API: Node source = ... for (Node related : source.expand( TYPE_ONE, Direction.INCOMING ) .add( TYPE_TWO, Direction.OUTGOING ).nodes()) { doSomethingWith( related ); } The return type of the Node.expand(...)-methods are the new Expansion type, it defaults to expanding to Relationship, but the Expansion.nodes()-method makes it expand to Node. It also contains the add()-methods seen above for specifying RelationshipTypes to include in the expansion. The spelling of this method isn't perfectly decided yet, we are choosing between add, and and include, we want something that reads nicely in the code, but doesn't conflict with keywords in other JVM-languages (and is a keyword in Python, and I think include means something special in Ruby). There is also an Expansion.exlude(RelationshipType)-method for use together with the Node.expandAll(). The Expansion is backed by the newly added companion interface Expander. This is an extension of RelationshipExpander that adds builder capabilities. It turns the functionality of the DefultExpander implementation class (now removed) into an interface in the API. RelationshipExpander is still around as a single method interface, which is useful for when you want to implement your own expansion logic. This API is added to trunk so that we can get feedback from everyone who use the snapshot builds of Neo4j, if the response to this API isn't positive, it will probably be removed before the release of 1.1, so please submit comments in this thread on what you think about this API. Happy Hacking, -- Tobias Ivarsson tobias.ivars...@neotechnology.com Hacker, Neo Technology www.neotechnology.com Cellphone: +46 706 534857 ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] New tentative API in trunk: Expander/Expansion
Hi guys, thanks! I've pulled the latest java-astar-routing and looking at it now. Have changed to SNAPSHOT 0.6 too. Has the AStar algorithm been changed to use the Traversal framework now? If so, is there any way I can use the old AStar version? The last one required neo4j-apoc 1.1-SNAPSHOT neo4j-graph-algo 0.5-SNAPSHOT, but no longer works with these. Is there any combination of neo4j-graph-algo and neo4j-apoc that will work with the old version of AStar? My problem is this... We are in the final stages of evaluation for our thesis and have been asked if we can run a bunch of extra experiments. We use AStar on the GIS dataset experiments. We do NOT want to use the Traversal framework at all as the way we log traffic is by listening to the Add, Remove, Delete, Get, etc methods of GraphDatabaseService. The Traversal framework goes around these as far as I know, so we would lose our ability to monitor. Any suggestions would be great! Thanks, Alex On Wed, Jun 23, 2010 at 3:52 PM, Mattias Persson matt...@neotechnology.comwrote: Also, the latest graph-algo is 0.6-SNAPSHOT... so use that instead 2010/6/23 Anders Nawroth and...@neotechnology.com Hi! See: http://www.mail-archive.com/user@lists.neo4j.org/msg04044.html /anders On 06/23/2010 03:44 PM, Alex Averbuch wrote: Hi Tobias, It seems as though the new changes have broken the AStar code I'm using. I use: neo4j-apoc 1.1-SNAPSHOT neo4j-graph-algo 0.5-SNAPSHOT AStar uses DefaultExpander and can no longer find it. Here's an example of the code that worked until now. DefaultExpander relExpander = new DefaultExpander(); relExpander.add(GISRelationshipTypes.BICYCLE_WAY, Direction.BOTH); AStar sp = new AStar(graphDb, relExpander, costEval, estimateEval); Path path = sp.findSinglePath(startNode, endNode); The problem seems to be that AStar wants a RelationshipExpander but now I can only create an ExpansionRelationship. Do you have any suggestions as to how to make this work again? Regards, Alex On Wed, Jun 23, 2010 at 11:14 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Hi Neo4j enthusiasts! Yesterday I committed an API that Mattias and I have been working on for a few days. It comes in the form of two new interfaces Expander and Expansion (in the org.neo4j.graphdb package), and four new methods in the Node interface (method names starting with expand). The two main problems this API solves are: 1. Adds a clean way for getting related nodes from a source node. 2. Adds a type safe way for declaratively specifying any combination of RelationshipTypes and Directions to expand. This replaces what was actually an anti-pattern, but something we saw people doing a lot: using a depth-one traversal to get the nodes related to a node, without needing to bother with the Relationship interface. Example: // The most convenient way to write it in the past: Node source = ... Traverser traverser = source.traverse( Traverser.Order.DEPTH_FIRST, new StopAtDepth( 1 ), ReturnableEvaluator.ALL_BUT_START_NODE, TYPE_ONE, Direction.INCOMING, TYPE_TWO, Direction.OUTGOING); for (Node related : traverser) { doSomethingWith( related ); } // The previously recommended (and bloated) way of doing it: Node source = ... for (Relationship rel : source.getRelationships( TYPE_ONE, TYPE_TWO )) { Node related; if (rel.isType(TYPE_ONE)) { related = rel.getStartNode(); if (related.equals(source)) continue; // we only want INCOMING TYPE_ONE } else if (rel.isType(TYPE_TWO)) { related = rel.getEndNode(); if (related.equals(source)) continue; // we only want OUTGOING TYPE_TWO } else { continue; // should never happen, but makes javac know that related is != null } doSomethingWith( related ); } // With the new API: Node source = ... for (Node related : source.expand( TYPE_ONE, Direction.INCOMING ) .add( TYPE_TWO, Direction.OUTGOING ).nodes()) { doSomethingWith( related ); } The return type of the Node.expand(...)-methods are the new Expansion type, it defaults to expanding to Relationship, but the Expansion.nodes()-method makes it expand to Node. It also contains the add()-methods seen above for specifying RelationshipTypes to include in the expansion. The spelling of this method isn't perfectly decided yet, we are choosing between add, and and include, we want something that reads nicely in the code, but doesn't conflict with keywords in other JVM-languages (and is a keyword in Python, and I think include means something special in Ruby). There is also an Expansion.exlude(RelationshipType)-method for use together with the Node.expandAll
Re: [Neo4j] New tentative API in trunk: Expander/Expansion
Sweet, Experiments are running fine again. Thanks for the quick help! Alex On Wed, Jun 23, 2010 at 4:17 PM, Mattias Persson matt...@neotechnology.comwrote: It works with graph-algo 0.6-SNAPSHOT and there's an AStar.java (which is the old version) and one ExperimentalAStar.java which uses the new traversal framework. 2010/6/23 Alex Averbuch alex.averb...@gmail.com Hi guys, thanks! I've pulled the latest java-astar-routing and looking at it now. Have changed to SNAPSHOT 0.6 too. Has the AStar algorithm been changed to use the Traversal framework now? If so, is there any way I can use the old AStar version? The last one required neo4j-apoc 1.1-SNAPSHOT neo4j-graph-algo 0.5-SNAPSHOT, but no longer works with these. Is there any combination of neo4j-graph-algo and neo4j-apoc that will work with the old version of AStar? My problem is this... We are in the final stages of evaluation for our thesis and have been asked if we can run a bunch of extra experiments. We use AStar on the GIS dataset experiments. We do NOT want to use the Traversal framework at all as the way we log traffic is by listening to the Add, Remove, Delete, Get, etc methods of GraphDatabaseService. The Traversal framework goes around these as far as I know, so we would lose our ability to monitor. Any suggestions would be great! Thanks, Alex On Wed, Jun 23, 2010 at 3:52 PM, Mattias Persson matt...@neotechnology.comwrote: Also, the latest graph-algo is 0.6-SNAPSHOT... so use that instead 2010/6/23 Anders Nawroth and...@neotechnology.com Hi! See: http://www.mail-archive.com/user@lists.neo4j.org/msg04044.html /anders On 06/23/2010 03:44 PM, Alex Averbuch wrote: Hi Tobias, It seems as though the new changes have broken the AStar code I'm using. I use: neo4j-apoc 1.1-SNAPSHOT neo4j-graph-algo 0.5-SNAPSHOT AStar uses DefaultExpander and can no longer find it. Here's an example of the code that worked until now. DefaultExpander relExpander = new DefaultExpander(); relExpander.add(GISRelationshipTypes.BICYCLE_WAY, Direction.BOTH); AStar sp = new AStar(graphDb, relExpander, costEval, estimateEval); Path path = sp.findSinglePath(startNode, endNode); The problem seems to be that AStar wants a RelationshipExpander but now I can only create an ExpansionRelationship. Do you have any suggestions as to how to make this work again? Regards, Alex On Wed, Jun 23, 2010 at 11:14 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Hi Neo4j enthusiasts! Yesterday I committed an API that Mattias and I have been working on for a few days. It comes in the form of two new interfaces Expander and Expansion (in the org.neo4j.graphdb package), and four new methods in the Node interface (method names starting with expand). The two main problems this API solves are: 1. Adds a clean way for getting related nodes from a source node. 2. Adds a type safe way for declaratively specifying any combination of RelationshipTypes and Directions to expand. This replaces what was actually an anti-pattern, but something we saw people doing a lot: using a depth-one traversal to get the nodes related to a node, without needing to bother with the Relationship interface. Example: // The most convenient way to write it in the past: Node source = ... Traverser traverser = source.traverse( Traverser.Order.DEPTH_FIRST, new StopAtDepth( 1 ), ReturnableEvaluator.ALL_BUT_START_NODE, TYPE_ONE, Direction.INCOMING, TYPE_TWO, Direction.OUTGOING); for (Node related : traverser) { doSomethingWith( related ); } // The previously recommended (and bloated) way of doing it: Node source = ... for (Relationship rel : source.getRelationships( TYPE_ONE, TYPE_TWO )) { Node related; if (rel.isType(TYPE_ONE)) { related = rel.getStartNode(); if (related.equals(source)) continue; // we only want INCOMING TYPE_ONE } else if (rel.isType(TYPE_TWO)) { related = rel.getEndNode(); if (related.equals(source)) continue; // we only want OUTGOING TYPE_TWO } else { continue; // should never happen, but makes javac know that related is != null } doSomethingWith( related ); } // With the new API: Node source = ... for (Node related : source.expand( TYPE_ONE, Direction.INCOMING ) .add( TYPE_TWO, Direction.OUTGOING ).nodes()) { doSomethingWith( related ); } The return type of the Node.expand(...)-methods are the new
Re: [Neo4j] New tentative API in trunk: Expander/Expansion
Hi, Personally I'd prefer and/not. It's intuitive to any programmer and has the bonus of being much shorter, so long complex expansions will appear less messy in practice. On Wed, Jun 23, 2010 at 5:45 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Did anyone have an opinion on what to call the methods on the Expansion interface that specify which types to expand. Alternative 1) ExpansionT and( RelationshipType type ); ExpansionT and( RelationshipType type, Direction direction ); ExpansionT not( RelationshipType type ); Examples: for (Node node : startNode.expand( KNOWS ) .and( LIKES ) .and( LOVES ) ) { doSomethingWith(node); } ExpansionRelationship expansion = startNode.expand( KNOWS ); expansion = expansion.and( LIKES ); expansion = expansion.and( LOVES ); Alternative 2) ExpansionT including( RelationshipType type ); ExpansionT including( RelationshipType type, Direction direction ); ExpansionT excluding( RelationshipType type ); Examples: for (Node node : startNode.expand( KNOWS ) .including( LIKES ) .including( LOVES ) ) { doSomethingWith(node); } ExpansionRelationship expansion = startNode.expand( KNOWS ); expansion = expansion.including( LIKES ); expansion = expansion.including( LOVES ); Cheers, Tobias On Wed, Jun 23, 2010 at 11:14 AM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Hi Neo4j enthusiasts! Yesterday I committed an API that Mattias and I have been working on for a few days. It comes in the form of two new interfaces Expander and Expansion (in the org.neo4j.graphdb package), and four new methods in the Node interface (method names starting with expand). The two main problems this API solves are: 1. Adds a clean way for getting related nodes from a source node. 2. Adds a type safe way for declaratively specifying any combination of RelationshipTypes and Directions to expand. This replaces what was actually an anti-pattern, but something we saw people doing a lot: using a depth-one traversal to get the nodes related to a node, without needing to bother with the Relationship interface. Example: // The most convenient way to write it in the past: Node source = ... Traverser traverser = source.traverse( Traverser.Order.DEPTH_FIRST, new StopAtDepth( 1 ), ReturnableEvaluator.ALL_BUT_START_NODE, TYPE_ONE, Direction.INCOMING, TYPE_TWO, Direction.OUTGOING); for (Node related : traverser) { doSomethingWith( related ); } // The previously recommended (and bloated) way of doing it: Node source = ... for (Relationship rel : source.getRelationships( TYPE_ONE, TYPE_TWO )) { Node related; if (rel.isType(TYPE_ONE)) { related = rel.getStartNode(); if (related.equals(source)) continue; // we only want INCOMING TYPE_ONE } else if (rel.isType(TYPE_TWO)) { related = rel.getEndNode(); if (related.equals(source)) continue; // we only want OUTGOING TYPE_TWO } else { continue; // should never happen, but makes javac know that related is != null } doSomethingWith( related ); } // With the new API: Node source = ... for (Node related : source.expand( TYPE_ONE, Direction.INCOMING ) .add( TYPE_TWO, Direction.OUTGOING ).nodes()) { doSomethingWith( related ); } The return type of the Node.expand(...)-methods are the new Expansion type, it defaults to expanding to Relationship, but the Expansion.nodes()-method makes it expand to Node. It also contains the add()-methods seen above for specifying RelationshipTypes to include in the expansion. The spelling of this method isn't perfectly decided yet, we are choosing between add, and and include, we want something that reads nicely in the code, but doesn't conflict with keywords in other JVM-languages (and is a keyword in Python, and I think include means something special in Ruby). There is also an Expansion.exlude(RelationshipType)-method for use together with the Node.expandAll(). The Expansion is backed by the newly added companion interface Expander. This is an extension of RelationshipExpander that adds builder capabilities. It turns the functionality of the DefultExpander implementation class (now removed) into an interface in the API. RelationshipExpander is still around as a single method interface, which is useful for when you want to implement your own expansion logic. This API is added to trunk so that we can get feedback from everyone who use the snapshot builds of Neo4j, if the response to this API isn't positive, it will probably be removed before the release of 1.1, so please submit comments in this thread on what you think about this API. Happy Hacking, -- Tobias Ivarsson
Re: [Neo4j] Compacting files?
Hi Thomas, From what I understand (someone correct me if I'm wrong please) - Neo4j does reuse the holes. - It will not reuse the holes you created in the current session, but will reuse all holes created in previous sessions. My problem is a bit of a strange use case as I'm trying to reduce the size of a dataset to make it manageable for running computationally expensive graph algorithms over the entire graph. So, in my case I'm only deleting, never adding, giving Neo4j no opportunity to fix the holes I created. Alex On Thu, Jun 3, 2010 at 5:16 AM, Thomas Sant'ana maill...@gmail.com wrote: On Wed, Jun 2, 2010 at 9:30 AM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. I like Postgres Auto Vaccum feature. I think if neo Reuses the holes it's already nice. Some kind of compression and truncate of the files would be great. In my opinion. Just my 2 cents, Thomas ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo4j] Compacting files?
Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Hi Johan, Do you mean a utility that creates a new Neo4j instance and copies all entities into it from an old Neo4j instance? That's definitely no problem. I've written a bit of import/export code in my graph_gen_utils branch. I have a GraphReader interface which is generic and only contains getNodes() getRels() methods definitions, which return iterators. The iterators are of type NodeData, basically a HashMap of HashMap for simplicity. 1 NodeData can contain 1 Node with Properties and all it's Relationships with Properties. Then I implemented various readers that I needed during the thesis. For example, ChacoParser, GMLParser, TwitterParser (proprietry format), etc which all implement GraphReader. Similarly for GraphWriter... That made it easy for me to add any parser and use my existing methods for buffering multiple entities into Transactions, etc. It's far from perfect, but might give an idea or two. Maybe some of that could be reused, although someone would definitely need to evaluate the quality of my code first. Blueprints has some import functionality too (.graphml format for example). Cheers, Alex On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.comwrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
Hi Craig, Just a quick note about needing to keep all IDs in memory during an import/export operation. The way I'm doing it at the moment it's not necessary to do so. When exporting: Write IDs to the exported format (this could be JSON, XML, GML, GraphML, etc) When importing: First import all Nodes, this is easy to do in most formats (all that I've tried). While importing Nodes, store index 1 extra property in every Node, I call this GID for global ID. Next import all Relationships, using the GID and Lucene to locate start Node end Node. The biggest graph I've tried with this approach had 2.5million Nodes 250million Relationships. It took a quite a long time, but much of the slowness was because it was performed on an old laptop with 2GB of RAM, I didn't give the BatchInserter a properties file, and I used default JVM parameters. There is at least one obvious downside to this though, and that is that you pollute the dataset with GID properties. Alex On Wed, Jun 2, 2010 at 5:53 PM, Craig Taverner cr...@amanzi.com wrote: I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any input on how that should work, what format to use etc. would be great. -Johan On Wed, Jun 2, 2010 at 9:23 AM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, Is there a way to compact the data stores (relationships, nodes, properties) in Neo4j? I don't mind if its a manual operation. I have some datasets that have had a lot of relationships removed from them but the file is still the same size, so I'm guessing there are a lot of holes in this file at the moment. Would this be hurting lookup performance? Cheers, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Compacting files?
- On disk, and lucene is a good idea here. Why not index with lucene, but without storing the property to the node? I like it! This sounds like a cleaner approach than my current one, and (I'm not sure about how to do this either) may be no more complex than the way I'm doing it. Like you say, we can delete the Lucene index afterwards... or just the Lucene folder associated with that one property. I'm writing exams, thesis reports, and thesis opposition reports for the next month so I won't have time to try it out. If you give it a try I'd be interesting in hearing how the Lunece only approach work out though. On Wed, Jun 2, 2010 at 6:42 PM, Craig Taverner cr...@amanzi.com wrote: Yes. I guess you cannot escape an old-new ID map (or in your case ID-GID). I think it is possible to maintain that outside the database: - In memory, as I suggested, but only valid under some circumstances - On disk, and lucene is a good idea here. Why not index with lucene, but without storing the property to the node? Since the index method takes the node, the property and the value, I assume the property and value might be possible to index without actually being real properties and values? I've not tried, but this way the graph is cleaner, and we can delete the lucene index afterwards! On Wed, Jun 2, 2010 at 6:12 PM, Alex Averbuch alex.averb...@gmail.com wrote: Hi Craig, Just a quick note about needing to keep all IDs in memory during an import/export operation. The way I'm doing it at the moment it's not necessary to do so. When exporting: Write IDs to the exported format (this could be JSON, XML, GML, GraphML, etc) When importing: First import all Nodes, this is easy to do in most formats (all that I've tried). While importing Nodes, store index 1 extra property in every Node, I call this GID for global ID. Next import all Relationships, using the GID and Lucene to locate start Node end Node. The biggest graph I've tried with this approach had 2.5million Nodes 250million Relationships. It took a quite a long time, but much of the slowness was because it was performed on an old laptop with 2GB of RAM, I didn't give the BatchInserter a properties file, and I used default JVM parameters. There is at least one obvious downside to this though, and that is that you pollute the dataset with GID properties. Alex On Wed, Jun 2, 2010 at 5:53 PM, Craig Taverner cr...@amanzi.com wrote: I've thought about this briefly, and somehow it actually seems easier (to me) to consider a compacting (defragmenting) algorithm than a generic import/export. The problem is that in both cases you have to deal with the same issue, the node/relationship ID's are changed. For the import/export this means you need another way to store the connectedness, so you export the entire graph into another format that maintains the connectedness in some way (perhaps a whole new set of IDs), and the re-import it again. Getting a very complex, large and cyclic graph to work like this seems hard to me because you have to maintain a complete table in memory of the identity map during the export (which makes the export unscalable). But de-fragmenting can be done by changing ID's in batches, breaking the problem down into smaller steps, and never neading to deal with the entire graph at the same time at any point. For example, take the node table, scan from the base collecting free ID's. Once you have a decent block, pull that many nodes down from above in the table. Since you keep the entire set in memory, you maintain the mapping of old-new and can use that to 'fix' the relationship table also. Rinse and repeat :-) One option for the entire graph export that might work for most datasets that have predominantly tree structures is to export to a common tree format, like JSON (or, XML). This maintains most of the relationships without requiring any memory of id mappings. The less common cyclic connections can be maintained with temporary ID's and a table of such ID's maintained in memory (assuming it is much smaller than the total graph). This can allow complete export of very large graphs if the temp id table does indeed remain small. Probably true for many datasets. On Wed, Jun 2, 2010 at 2:30 PM, Johan Svensson jo...@neotechnology.com wrote: Alex, You are correct about the holes in the store file and I would suggest you export the data and then re-import it again. Neo4j is not optimized for the use case were more data is removed than added over time. It would be possible to write a compacting utility but since this is not a very common use case I think it is better to put that time into producing a generic export/import dump utility. The plan is to get a export/import utility in place as soon as possible so any
Re: [Neo4j] [Neo] Is neo4j a good soultion?
I've done a fair bit of loading of Neo4j instances from different graph file formats recently, and I agree. For me, about 10,000 operations per Transaction worked well. On Tue, Jun 1, 2010 at 6:44 AM, Craig Taverner cr...@amanzi.com wrote: A quick comment about transaction size. I find a good speed/memory balance at a few thousand writes per transaction. More than that improves performance with diminishing returns. 2010/5/17, Mattias Persson matt...@neotechnology.com: 2010/5/17 Kiss Miklós kissmik...@freemail.hu I enabled the lucene index cache for the mostly used properties and it made a huge difference! Great Regarding the VM heap size, I started with 128MB but already tried to increase it to 512MB but didn't help. Now I'm trying to estimate the actual memory requirement. I guess that using lucene index cache further increases the memory requirement. Yeah it does... do you have much RAM in your machine? try to increase the heap to 1 or 2 Gb and see what happens. Miklós 2010.05.17. 11:15 keltezéssel, Mattias Persson írta: 2010/5/17 Kiss Miklóskissmik...@freemail.hu Hello, Thanks for the answer. No I haven't looked at lucene index cache yet but will soon, thanks for the tip. The strings I store are mostly about 10 characters long but I also have some about 30 chars. I wouldn't consider these as 'big strings'. That's a very common (and rather optimal for Neo4j) string size. Is it a good choice to put all write operations into one transaction? This means in my case a few thousand nodes (2-5000) and about 4-5 times more relation. Or would it give better performance if I sliced operation into smaller transactions? What would be the optimal transaction size? That should yield a good write performance, yes. How much heap have you given the JVM? Write operations in a transaction is kept in memory until committed so if you don't have a big heap size it can be a problem and cause out-of-memory problems like you encountered... Thanks, Miklos 2010.05.17. 9:35 keltezéssel, Mattias Persson írta: Hi, sorry for a late response. Yep, this seems like an excellent fit for Neo4j. Regarding lucene index lookup performance: have you looked at enabling caching http://wiki.neo4j.org/content/Indexing_with_IndexService#Caching ? It can speed up lookups considerably. Do you store very big strings or just few words? Neo4j currently isn't optimal at storing big strings. For this an integration with another database could be a solution. 2010/5/10 Kiss Miklóskissmik...@freemail.hu Hello, I'd like to ask if using Neo4j would be a good solution for the following scenario. I have an application which performs some natural language text analysis. I want to put the results of this analysis into a database. I have words, stems, collocations, themes and many relations between them. This is why neo4j seems to be a good solution. However, I ran into performance problems: I need to use lucene index service heavily (I have to look up if a node I'm up to store already exists) which I think is a bit slow. The other problem is java heap space: some documents cause my app to halt with out of memory exception (for which I couldn't yet find the reason). My questions are: 1, Is my data storage scenario a good one (nodes = words, relations = relations) or there could be a better one? 2, How should I perform the lookup of nodes in the database? 3, Or should I use some other database? Thanks in advance, Miklós ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Skickat från min mobila enhet ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Efficient way to sparsify a graph?
Hi Peter, Yeah it's under control now. Cheers, Alex On Thu, May 27, 2010 at 5:36 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Alex, not an expert on delete performance but that looks ok to me, and is workable for you now? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, May 25, 2010 at 9:51 AM, Alex Averbuch alex.averb...@gmail.com wrote: OK, seems to be running much faster now. The reason for such slow performance was mine, not Neo4j's. Before the input parameters to my function allowed the user to specify exactly how many Relationships should be removed. This meant in order to be fair I had to uniformly randomly generate ID's in the key space (also defined by input parameter). The problem with this approach is as time passes it becomes more likely to select an ID that has already been removed. I had checks to avoid errors as a result, but it still meant many wasted disk reads. //BEFORE until (graph_is_small_enough) random_number = generate_uniform(0, maxId) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() Now the input parameters only let the user specify the PERCENTAGE of all Relationships that should be kept. This means I don't need to keep an state that tells me which ID's have already been deleted, and I can iterate through all Relationships and never have a missed read (assuming there are no wholes in the key space). //NOW for (index=0 to maxId) random_number = generate_uniform(0, maxId) if (random_number percent_to_keep) continue; random_relationship = get_relationship_by_id(index) random_relationship.delete() Performance is much better now. The first 1,000,000 deletions took ~4minutes Cheers, Alex On Mon, May 24, 2010 at 11:24 PM, Alex Averbuch alex.averb...@gmail.com wrote: Hey, I have a large (by my standards) graph and I would like to reduce it's size so it all fits in memory. This is that same Twitter graph as I mentioned earlier: 2.5million Nodes 250million Relationships. The goal is for the graph to still have the same topology and characteristics after it has been made more sparse. My plan to do this was to uniformly randomly select Relationships for deletion, until the graph is small enough. My first approach is basically this: until (graph_is_small_enough) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() I'm using the transactional GraphDatabaseService at the moment, rather than the BatchInserter... mostly because I'm not inserting anything and I assumed the optimizations made to the BatchInserter were only for write operations. The reason I want to delete Relationships instead of Nodes is (1) I don't want to accidentally delete any super nodes, as these are what gives Twitter it's unique structure (2) The number of Nodes is not the main problem that's keeping me from being able to store the graph in RAM The problem with the current approach is that it feels like I'm working against Neo4j's strengths and it is very very slow... I waited over an hour and less than 1,000,000 Relationships had been deleted. Given that my aim is to half the number of Relationships, it would take me over 100hours (1 week) to complete this process. In the worst case this is what I'll resort to, but I'd rather not if there's a better way. My questions are: (1) Can you think of an alternative, faster and still meaningful (maintain graph structure) way to reduce this graph size? (2) Using the same method I'm using now, are there some magical optimizations that will greatly improve performance? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Efficient way to sparsify a graph?
OK, seems to be running much faster now. The reason for such slow performance was mine, not Neo4j's. Before the input parameters to my function allowed the user to specify exactly how many Relationships should be removed. This meant in order to be fair I had to uniformly randomly generate ID's in the key space (also defined by input parameter). The problem with this approach is as time passes it becomes more likely to select an ID that has already been removed. I had checks to avoid errors as a result, but it still meant many wasted disk reads. //BEFORE until (graph_is_small_enough) random_number = generate_uniform(0, maxId) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() Now the input parameters only let the user specify the PERCENTAGE of all Relationships that should be kept. This means I don't need to keep an state that tells me which ID's have already been deleted, and I can iterate through all Relationships and never have a missed read (assuming there are no wholes in the key space). //NOW for (index=0 to maxId) random_number = generate_uniform(0, maxId) if (random_number percent_to_keep) continue; random_relationship = get_relationship_by_id(index) random_relationship.delete() Performance is much better now. The first 1,000,000 deletions took ~4minutes Cheers, Alex On Mon, May 24, 2010 at 11:24 PM, Alex Averbuch alex.averb...@gmail.comwrote: Hey, I have a large (by my standards) graph and I would like to reduce it's size so it all fits in memory. This is that same Twitter graph as I mentioned earlier: 2.5million Nodes 250million Relationships. The goal is for the graph to still have the same topology and characteristics after it has been made more sparse. My plan to do this was to uniformly randomly select Relationships for deletion, until the graph is small enough. My first approach is basically this: until (graph_is_small_enough) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() I'm using the transactional GraphDatabaseService at the moment, rather than the BatchInserter... mostly because I'm not inserting anything and I assumed the optimizations made to the BatchInserter were only for write operations. The reason I want to delete Relationships instead of Nodes is (1) I don't want to accidentally delete any super nodes, as these are what gives Twitter it's unique structure (2) The number of Nodes is not the main problem that's keeping me from being able to store the graph in RAM The problem with the current approach is that it feels like I'm working against Neo4j's strengths and it is very very slow... I waited over an hour and less than 1,000,000 Relationships had been deleted. Given that my aim is to half the number of Relationships, it would take me over 100hours (1 week) to complete this process. In the worst case this is what I'll resort to, but I'd rather not if there's a better way. My questions are: (1) Can you think of an alternative, faster and still meaningful (maintain graph structure) way to reduce this graph size? (2) Using the same method I'm using now, are there some magical optimizations that will greatly improve performance? Thanks, Alex ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo] Neo4j Screencasts?
Nice! This would've been really useful when I was first getting to grips with what Neo4j was I think a Neo4j + Blueprints/Pipes/Gremlin would be cool too, for those people that are worried about getting locked in On Tue, May 25, 2010 at 2:18 PM, Peter Neubauer peter.neuba...@neotechnology.com wrote: Eelco and all, did a first screencast on Neoclipse - just for you my friend! http://vimeo.com/channels/109293#12014944 WDYT? Any tips or improvement hints? More suggestions? Cheers, /peter neubauer COO and Sales, Neo Technology GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, May 25, 2010 at 8:55 AM, Eelco Hillenius eelco.hillen...@gmail.com wrote: Anything more? I'd love to see a walkthrough on visualization (how do I make pretty pictures of my graphs). Eelco ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] [Neo] Neo4j Screencasts?
Hi Eelco, I agree it would be nice to maybe even have a series based only on visualization tools. maybe you're familiar with this already, but in case you're not I would recommend checking out iGraph http://igraph.sourceforge.net/ http://igraph.sourceforge.net/it has bindings to Python, R, and C, is really easy to use, and supports a lot of graph file formats. it also deals with reasonably large graphs in a decent way. I've visualized graphs of 10,000-100,000 vertices using its layout algorithms and graphs of 1 million vertices when I was able to specify my own coordinates from domain knowledge. I use it by writing my Neo4j instance to a .gml file and then loading that file into iGraph, but I think a better approach would be writing a .graphml file (there's a parser/writer in Blueprints http://github.com/tinkerpop/blueprints I think) cheers, Alex On Tue, May 25, 2010 at 7:46 PM, Eelco Hillenius eelco.hillen...@gmail.comwrote: Eelco and all, did a first screencast on Neoclipse - just for you my friend! http://vimeo.com/channels/109293#12014944 WDYT? Any tips or improvement hints? More suggestions? Hi Peter, thanks! Not to want to sound ungrateful - I think this is a good resource to have anyway - but I was thinking more along the lines of Graphviz or JUNG etc. Cheers, Eelco ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Efficient way to sparsify a graph?
Hey, I have a large (by my standards) graph and I would like to reduce it's size so it all fits in memory. This is that same Twitter graph as I mentioned earlier: 2.5million Nodes 250million Relationships. The goal is for the graph to still have the same topology and characteristics after it has been made more sparse. My plan to do this was to uniformly randomly select Relationships for deletion, until the graph is small enough. My first approach is basically this: until (graph_is_small_enough) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() I'm using the transactional GraphDatabaseService at the moment, rather than the BatchInserter... mostly because I'm not inserting anything and I assumed the optimizations made to the BatchInserter were only for write operations. The reason I want to delete Relationships instead of Nodes is (1) I don't want to accidentally delete any super nodes, as these are what gives Twitter it's unique structure (2) The number of Nodes is not the main problem that's keeping me from being able to store the graph in RAM The problem with the current approach is that it feels like I'm working against Neo4j's strengths and it is very very slow... I waited over an hour and less than 1,000,000 Relationships had been deleted. Given that my aim is to half the number of Relationships, it would take me over 100hours (1 week) to complete this process. In the worst case this is what I'll resort to, but I'd rather not if there's a better way. My questions are: (1) Can you think of an alternative, faster and still meaningful (maintain graph structure) way to reduce this graph size? (2) Using the same method I'm using now, are there some magical optimizations that will greatly improve performance? Thanks, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Batch inserter performance
Hey, I'm loading a graph from a proprietary binary file format into Neo4j using the batch inserter. The graph (Twitter crawl results) has 2,500,000 Nodes 250,000,000 Relationships. Here's what I'm doing: (1) Insert all Nodes first. While doing so I also add 1 property (lets call is CUSTOM_ID) and index it with Lucene. (2) Call optimize() on the index (3) Insert all the Relationships. I use CUSTOM_ID to lookup the start end Nodes. Relationships have no properties. The problem is that the insertion performance seems to decay quite quickly as the size increases. I'm keeping track of how long it takes to insert the records. In the beginning it took about 5 minutes to insert 1,000,000 Relationships. After about 50,000,000 inserted Relationships it was close to 10 minutes to insert 1,000,000 Relationships. By the time I was up to 70,000,000 it was taking 12 minutes to insert 1,000,000 Relationships. That's a drop from ~7,000 Relationships/Second to ~3000 Relationships/Second and I'm worried that if this continues it could take over a week to load this dataset. Can you think of anything that I'm doing wrong? I have a neo.prop file but I'm not using it... I create the batch inserter with only 1 parameter (database directory). Cheers, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing Relationships?
First of all, I'd like to thank everybody who chimed in on this thread with their own use cases. To answer Tobias' original question, I have two examples of use cases that we want to support. In the first example, we'd like to represent the coauthorship network of inventors who hold patents registered in the United States, so each node is an author and each relationship is a coauthorship of a patent. We are particularly interested in how changes in authors' location or employment connect otherwise disjoint sets of coauthors (e.g. somebody moves from HP to Canon and provides a potential source of collaboration between engineers at these firms). Because a node's employer or location state changes with time, we choose to store this information in the relationships instead of the nodes. Often we are interested in collaborations that took place in a certain region during a certain timespan -- in this case, it makes sense to query the relationships. In another example, we have data on conflicts between countries that we would like to represent as a network. Each node is a country and each relationship indicates that two countries had a conflict at a particular time. We have time data stored in the relationships, and we would like to query for conflicts that occurred during a particular timeframe. In general, we want to provide data hosting for social scientists who have dyadic relational data. We'd like for them to be able to upload a graphml file that encodes their property graph, and then allow other researchers to query that graph for particular nodes and edges and download that subgraph. In particular, we would like the kind of each edge and node querying support found in network analysis packages like igraph. I realize that in most cases the relationship-querying problem can be solved with better domain modeling. In an extreme case we could make the graph bipartite and simply have each relationship be intercepted by a node that holds the properties that we would like to query. Of course, in many cases it's be possible to consolidate these relationship nodes or the properties they contain into nodes that represent other entities (a patent, a year, a place) and reduce the number of relationships in the graph. However, this is hard to generalize and in certain cases makes the storage engine much less efficient (2 times the edges, plus a new node for each edge in the worst case). Still, in requesting this feature, I should mention that we have also prototyped ways to make this relationship consolidation more general, but that it seems less straightforward than extending the indexing engine to include relationships. It is possible that this very flat graph representation isn't a priority for the Neo4j team and that our use of the database is an abuse of the system, but this is the type of structure that network researchers have the most interest in, and Neo4j is by far the best database available to store it. And while network analysis techniques will eventually adapt to more advanced representations, they currently rely on the much flatter structure of one type of node and one type of relationship. Allowing this kind of indexing on relationships would greatly enhance Neo4j's usability in the network analysis community, and perhaps begin to push researchers to explore the properties of more complex graphs that Neo4j is capable of representing. Thanks very much, Alex D'Amour On Sat, May 15, 2010 at 2:49 PM, Niels Hoogeveen pd_aficion...@hotmail.comwrote: I use relationships to encode paths in the graph based on the meta model. For example: Class(Article) -- Relationship(Author) -- Class(User) -- Property(Username) Right now I encode this using an md5 encoding of the above path, add a property to the first entity in the path, using the md5 encoding as the key (the value is irrelevant), relationships (with a DynamicRelationshipType with a name equal to the md5 key) are used to link the various items in the path. Finding the path requires a traversal from the first Class node in the path, following the given relationships. This traversal can potentially be expensive when a class takes many instances (all have a relationship to the class). When relationships were indexed, the path could be encoded by giving each relationship making up the path a property encoding the path, then use the index to retrieve all relationships making up the path and lay those relationships head to toe to construct the path. No longer would a traversal be necessary and the cost of the operation only depends on the number of elements in the path, and not to the number of relationships one of the elements in the path can potentially have. Niels From: tobias.ivars...@neotechnology.com Date: Sat, 15 May 2010 13:32:36 +0200 To: user@lists.neo4j.org Subject: Re: [Neo] Indexing Relationships? There is no indexing component for Relationships and there has never been one. The interesting question
Re: [Neo] Community Program Review at FOSS4G 2010
voted too On Thu, May 6, 2010 at 8:34 AM, Raul Raja Martinez raulr...@gmail.comwrote: voted for it 2010/5/4 Craig Taverner cr...@amanzi.com Hi guys, I've applied to present Neo4j Spatial (Neo4j as a true GIS database for mapping data) at the FOSS4G conference in September. To increase the chances of the presentation getting accepted, it helps to get community votes. So, if you think Neo4j Spatial is a cool idea, vote for it :-) Please follow this link to express your opinion: http://2010.foss4g.org/review/ Regards, Craig -- Forwarded message -- From: Lorenzo Becchi lbec...@osgeo.org Date: Wed, May 5, 2010 at 2:02 AM Subject: Community Program Review at FOSS4G 2010 To: Lorenzo Becchi lore...@ominiverdi.com I would like to personally thank you for submitting your abstract for FOSS4G 2010. Here below there's the message to promote the public review of the 360 abstracts we've received. I imagine you want your abstract to be voted and your community to support you. Please feel free to forward this message to as much people as possible to make this public review something really useful. best regards Lorenzo Becchi -- At FOSS4G 2010 the community and conference registrants will have an opportunity to read through and score potential presentations prior to the selection of the final conference program. There is enough room in the conference schedule for 120 presentations. The conference committee will use the aggregate scores from the community review process to help choose which presentations to accept, and to assign presentations to appropriately sized rooms. The top quoted presentations will receive a special attention from the organization. Please follow this link to express your opinion: http://2010.foss4g.org/review/ ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Raul Raja ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Hiding properties in NeoClipse
Thanks! My requirement was basically to make the visualization less cluttered and that trick actually worked really well. On Thu, Apr 22, 2010 at 2:20 PM, Anders Nawroth and...@neotechnology.comwrote: Hi! There's isn't. But by not using the expanded mode (see the view menu in the upper right corner of the graph view) and instead select what properties should be included in the settings (Neo4j - Graph Decorations - Node/Relationship label properties), you get something useful in most cases. /anders Mattias Persson wrote: Yeah, that could be useful! I don't know if there's such a feature yet? Anders? 2010/4/22 Alex Averbuch alex.averb...@gmail.com Hey guys, Is there a way to stop specific properties on nodes/relationships from being displayed in NeoClipse? Hiding specific relationship types is really easy, but I haven't found a similar setting for properties yet. Cheers, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Iterator over Relationships?
Hello all, Is there an easy way to create an iterator over all relations stored in a database? There's database.getAllNodes(). Why isn't there a getAllRelationships() method? There appears to have been one in the past, but it looks like it's protected now? Is there a specific reason why this might be a bad idea? If so, would simply iterating over the nodes, and getting edges that way be substantially faster despite touching each edge twice? Thanks, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing on Doubles in Neo4j
Rick, Thanks for the suggestion, but we need a wider range of measures than those supplied here, so I'm following the tinkerpop team's lead and implementing the JUNG Graph interface, but using traversers under the hood. JUNG has a much fuller set of algorithms and seems to be more actively supported. However, we'll be dealing with network data that have real-valued covariates, and we'd prefer not to throw away information. Thanks, Alex On Mon, Mar 22, 2010 at 5:17 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Maybe this? http://components.neo4j.org/neo4j-graph-algo/apidocs/org/neo4j/graphalgo/cen trality/package-summary.htmlhttp://components.neo4j.org/neo4j-graph-algo/apidocs/org/neo4j/graphalgo/cen%0Atrality/package-summary.html -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Alex D'Amour Sent: Monday, March 22, 2010 5:06 PM To: Neo user discussions Subject: [Neo] Indexing on Doubles in Neo4j Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Indexing on Doubles in Neo4j
Craig, Please keep me (or just the list) updated on this. Thanks, Alex On Tue, Mar 23, 2010 at 5:43 AM, Craig Taverner cr...@amanzi.com wrote: Last year we wrote a multi-dimensional index for floats, similar in principle to the timeline index, but working on multiple floats (and doubles). We used it to index locations. Now we are hoping to include the same concepts in the new Neo4j Spatialhttp://wiki.neo4j.org/content/Neo4j_Spatialproject. Even though this is targeting map data, it seems viable for any float/double property index. We hope to have some usable code for this within the next few weeks. On Mon, Mar 22, 2010 at 10:13 PM, Rick Bullotta rick.bullo...@burningskysoftware.com wrote: Alex, due to floating point precision issues, you might be best off determining some type of integral rounded or scaled key as you suggest. If you end up using the Lucene indexing engine, you'd probably want to do something like this anyway, since indexing is string-based under the hood. That said, I wonder if any of the graph algos available for Neo could be used to determine centrality during traversal rather than storing it statically? -Original Message- From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On Behalf Of Alex D'Amour Sent: Monday, March 22, 2010 5:06 PM To: Neo user discussions Subject: [Neo] Indexing on Doubles in Neo4j Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Indexing on Doubles in Neo4j
Hello all, I'm working on an application where it would be nice to perform lookups on a graph database based on real-valued properties. For example, if I have a social network, and have assigned real-valued centrality measures to each node, I'd like to be able to choose all vertices whose centrality measure is greater than some threshold. I see that the Timeline index service offers this for integer-valued properties. Is there something similar (or in the pipeline) for doing the same with real-valued properties? Is there an easy way to adapt one of the current indexing utilities to do this (besides multiplying by 10^n for sufficiently large n and then rounding)? Thanks, Alex D'Amour Harvard Institute for Quantitative Social Science ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Syntax error when using neo4j.py
Hi, I am attempting to write a Python script to do some basic transactions with a Neo4j database. I'm using Jython 2.5rc1 and the latest neo4j.py module on Ubuntu 9.04. When I try to follow the code example on the neo4j.py page, I use the with statement as it shows: from neo4j import NeoService neo = NeoService( /neo/db/path ) with neo.transaction: ref_node = neo.reference_node new_node = neo.node() # put operations that manipulate the node space here ... neo.shutdown() However, when the with statement is parsed, the interpreter stops and returns the following syntax error: Syntax Error: 'with' will become a reserved word in Python 2.6 I've tried a bunch of different code and searched around for other people having this problem, but to no avail. All of my stuff is up to date, so why would the basic recommended code break on my system? --Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user