Re: [Neo] memory mapping
If a run is that long when performing traversals -server flag should be faster. Could you explain a bit more what type of traversal you are performing and what the graph looks like? Judging by the size of the store files you should be able to traverse the full graph many times in a single day on that machine. -Johan On Fri, May 21, 2010 at 1:15 PM, Lorenzo Livi lorenz.l...@gmail.com wrote: No, I use only one jvm instance for each run. My run usually last something like 1 day or 15 days. On Fri, May 21, 2010 at 1:10 PM, Johan Svensson jo...@neotechnology.com wrote: Yes, -server is usually slower the first few runs but once some information has been gathered and optimizations put in place it will be faster. Are you starting a new JVM for each traversal? ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Newbie question
Thomas, sorry for taking so long to answer. Welcome to Neo4j! On Fri, May 21, 2010 at 10:37 PM, Thomas Sant'ana maill...@gmail.com wrote: 1) In the Matrix example (I saw it in several videos), what whould be the proper way to query: All people that Neo knows and love him?' In this case it's a single hops, so I think the way is to iterate through all love relations and see if the come back to Neo. But if we have more hops would we need to have a traversal in a traversal? In this case, one approach would be to go along the KNOWS relationships of Neo (probably later in several hops) and check if there is an outgoing LOVES relationship back to Neo, something like http://gist.github.com/411695 (where Trinity doesn't know Neo so we go 2 KNOWS hops deep. 2) Is there a simple way to know how many edges/relation of a certain type come out of a node? I figure I can iterate throw the relations, but I was thinking of using this to choose the cheapest starting node for a traversal. Not really, since there are no global measures stored per se and almost all data is loaded lazy. However, if you want that, you could store the number of relationships when you operate on that node in an index (e.g. Lucene). Who useful that is regarding update performance and keeping things in sync depends on your usecase. 3) Let say I have a Car graph, with: ReferenceNode (RN) -- Cars RN --- Manufactures -- Ford / GM/ SAAB/ Volvo etc. RN --- Colors --- Grey, Silver, RN --- MakeYear --- 2000, 2001, 2002 A given car has a relaction to a Manufactures, Color, and Make year: Cars --- aCar aCar -- Silver aCar --- 2000 aCar --- SAAB How can I get all the 2000 - 2001, Silver, SAAB? I have made the basic structure in http://gist.github.com/411699 . Now, the question is about how best to optimize set operations between the different criteria. You could start at the year_2000 node (finding it with an index lookup for the year 2000, to be added to the code) as I did and iterate through all car nodes, and return everything that has a COLOR connection to the Silver node, and a MANUFACTURER connection to SAAB. However, having multiple years, you even could add an index on top of e.g. the year nodes in order to be able to effectively select multiple years. Do you have some more info on the dataset sizes of your domain so I can flesh the GIST out a bit with your search? /peter ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] neo4j performance issue help required
Hi, To use the Dijkstra algorithm to find all the paths between two nodes is a bit overkill. You should instead use the AllSimplePaths class which is optimized to find all the paths between two nodes (irregardless of weight). If you instead feel like updating to the latest neo4j kernel (1.1-SNAPSHOT) and graph-algo package (0.6-SNAPSHOT) the AllSimplePaths has been reimplemented using the new traversal framework and has a better interface, including a Path abstraction. Although I'm uncertain about the performance difference between the 0.3 version and the 0.6-SNAPSHOT version. 2010/5/21 Maaz Bin Tariq maaz.ta...@yahoo.com: Hello Everyone, I have been using neo4j for couple of months, but recently it is giving me performance issues while getting all path between two nodes. I have tried all the configuration for performance improvement but was unsuccessful. The number of Nodes is about 35k 1 property per node and number of relationship is about 100K. we are using Index on each node. The system load reached up to 8 to 9 during peak time. I have run JProfiler and below code seem to be the problem and taking much cpu util. Please let me know if i am doing something silly. We access this method continuously during heavy load and that can be every second. - DijkstraInteger dijkstra = new DijkstraInteger(0, start, end, new CostEvaluatorInteger() { public Integer getCost(Relationship relationship, boolean backwards) { // unweighted graph return 1; } }, new IntegerAdder(), new IntegerComparator(), Direction.BOTH, SocialNetworkRelationshipType.KNOWS); dijkstra.limitMaxCostToTraverse(3); dijkstra.getPathsAsNodes(); - Currently the system configuration was RAM 2 GB , 1 GB is allocated to JVM heap and 512 MB to memory mapping. JVM running with VM arrguments-server and -XX:+UseConcMarkSweepGC. Following are the neo4j jars we are using neo4j-graph-algo-0.3.jar neo4j-index-1.0.jar neo4j-kernel-1.0.jar neo4j-shell-1.0.jar neo4j-commons-1.0.jar lucene-core-2.9.1.jar After moving the neo4j to a new instance of 16 GB RAM , 5 GB allocated to JVM heap and 2.5 GB to memory mapping still give a system load up to 3. Kindly suggest whats can be done to optimize neo4j ? Thanks -Maaz Tariq ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Newbie question
2010/5/24 Peter Neubauer peter.neuba...@neotechnology.com: Thomas, sorry for taking so long to answer. Welcome to Neo4j! On Fri, May 21, 2010 at 10:37 PM, Thomas Sant'ana maill...@gmail.com wrote: 1) In the Matrix example (I saw it in several videos), what whould be the proper way to query: All people that Neo knows and love him?' In this case it's a single hops, so I think the way is to iterate through all love relations and see if the come back to Neo. But if we have more hops would we need to have a traversal in a traversal? In this case, one approach would be to go along the KNOWS relationships of Neo (probably later in several hops) and check if there is an outgoing LOVES relationship back to Neo, something like http://gist.github.com/411695 (where Trinity doesn't know Neo so we go 2 KNOWS hops deep. 2) Is there a simple way to know how many edges/relation of a certain type come out of a node? I figure I can iterate throw the relations, but I was thinking of using this to choose the cheapest starting node for a traversal. Not really, since there are no global measures stored per se and almost all data is loaded lazy. However, if you want that, you could store the number of relationships when you operate on that node in an index (e.g. Lucene). Who useful that is regarding update performance and keeping things in sync depends on your usecase. Or you could store that number on the node directly, which to me seems like a better place. 3) Let say I have a Car graph, with: ReferenceNode (RN) -- Cars RN --- Manufactures -- Ford / GM/ SAAB/ Volvo etc. RN --- Colors --- Grey, Silver, RN --- MakeYear --- 2000, 2001, 2002 A given car has a relaction to a Manufactures, Color, and Make year: Cars --- aCar aCar -- Silver aCar --- 2000 aCar --- SAAB How can I get all the 2000 - 2001, Silver, SAAB? I have made the basic structure in http://gist.github.com/411699 . Now, the question is about how best to optimize set operations between the different criteria. You could start at the year_2000 node (finding it with an index lookup for the year 2000, to be added to the code) as I did and iterate through all car nodes, and return everything that has a COLOR connection to the Silver node, and a MANUFACTURER connection to SAAB. However, having multiple years, you even could add an index on top of e.g. the year nodes in order to be able to effectively select multiple years. Do you have some more info on the dataset sizes of your domain so I can flesh the GIST out a bit with your search? /peter ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] memory mapping
Hi, I'm using many serial custom DFS (not the neo4j built-in), limited in depth, for each run. Then all the successive computations are in RAM (eigenvector determination etc...). The big bottleneck is obviously the access to the disk. The nodes of the graph have a high out-degree, then the typical DFS in impegnative ... Just a question: I'm using the read-only access mode to the graph-db. When I try to map everything (just like u suggested) in the properties serction I get many warning (cannot memory map ...). If i use lower memory for this section (the half) everything works fine (without warning). Why? It should always give me this warning ... Best regards, Lorenzo On Mon, May 24, 2010 at 11:41 AM, Johan Svensson jo...@neotechnology.com wrote: If a run is that long when performing traversals -server flag should be faster. Could you explain a bit more what type of traversal you are performing and what the graph looks like? Judging by the size of the store files you should be able to traverse the full graph many times in a single day on that machine. -Johan ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Fwd: Node not in use exception when using tx event handler
This was indeed a bug and I just committed a fix for it in trunk. -Johan On Tue, May 18, 2010 at 6:28 PM, Garrett Smith g...@rre.tt wrote: See attached. To reproduce, run against r4415: $ java -cp PATH NodeNotInUse DBDIR create $ java -cp PATH NodeNotInUse DBDIR delete NODEID I modified src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java to get the error info: --- src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java (revision 4415) +++ src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java (working copy) @@ -35,6 +35,7 @@ public void beforeCompletion() { + try { this.transactionData = nodeManager.getTransactionData(); states = new ArrayListHandlerAndState(); for ( TransactionEventHandlerT handler : this.handlers ) @@ -55,6 +56,10 @@ throw new RuntimeException( t ); } } + } catch (Throwable th) { + th.printStackTrace(); + throw new RuntimeException(th); + } } On Tue, May 18, 2010 at 6:38 AM, Johan Svensson jo...@neotechnology.com wrote: Garrett, This could be a bug. Could you please provide a test case that trigger this behavior. -Johan On Sat, May 15, 2010 at 8:46 PM, Tobias Ivarsson tobias.ivars...@neotechnology.com wrote: Create a ticket for it, I've tagged it for reviewing when I get back to the office, you had the great unfortune to send this right at the beginning of a 4 day Swedish holiday. If you could supply code that can reproduce it that would be even better. Cheers, Tobias On Sat, May 15, 2010 at 8:42 PM, Garrett Smith g...@rre.tt wrote: Is this something I should open a ticket for, or is it something the dev team is aware of? Or is it user error? Garrett -- Forwarded message -- From: Garrett Smith g...@rre.tt Date: Thu, May 13, 2010 at 2:52 PM Subject: Node not in use exception when using tx event handler To: Neo4j Users user@lists.neo4j.org I'm running into the exception below when I try to delete a node when first starting up a graph database. I'm experimenting with a transaction event handler. The error, however, occurs before my handler gets called. org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in use at org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009) at org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228) at org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432) at org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100) at org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628) at org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84) at org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591) at org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604) at org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855) at org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740) at org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914) at org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39) at org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341) at org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556) at org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103) at org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410) at gv.graph.Nodes.deleteNode(Nodes.java:349) at gv.graph.NodeDelete.handle(NodeDelete.java:20) at gv.graph.MessageHandler.run(MessageHandler.java:59) at java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908) at java.lang.Thread.run(Thread.java:619) May 13, 2010 2:42:56 PM org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion WARNING: Caught exception from tx syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6 ] beforeCompletion() May 13, 2010 2:42:56 PM org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion WARNING: Caught exception from tx syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6 ] afterCompletion() Code details: URL: https://svn.neo4j.org/components/kernel/trunk Repository Root: https://svn.neo4j.org Repository
[Neo] Efficient way to sparsify a graph?
Hey, I have a large (by my standards) graph and I would like to reduce it's size so it all fits in memory. This is that same Twitter graph as I mentioned earlier: 2.5million Nodes 250million Relationships. The goal is for the graph to still have the same topology and characteristics after it has been made more sparse. My plan to do this was to uniformly randomly select Relationships for deletion, until the graph is small enough. My first approach is basically this: until (graph_is_small_enough) random_relationship = get_relationship_by_id(random_number) random_relationship.delete() I'm using the transactional GraphDatabaseService at the moment, rather than the BatchInserter... mostly because I'm not inserting anything and I assumed the optimizations made to the BatchInserter were only for write operations. The reason I want to delete Relationships instead of Nodes is (1) I don't want to accidentally delete any super nodes, as these are what gives Twitter it's unique structure (2) The number of Nodes is not the main problem that's keeping me from being able to store the graph in RAM The problem with the current approach is that it feels like I'm working against Neo4j's strengths and it is very very slow... I waited over an hour and less than 1,000,000 Relationships had been deleted. Given that my aim is to half the number of Relationships, it would take me over 100hours (1 week) to complete this process. In the worst case this is what I'll resort to, but I'd rather not if there's a better way. My questions are: (1) Can you think of an alternative, faster and still meaningful (maintain graph structure) way to reduce this graph size? (2) Using the same method I'm using now, are there some magical optimizations that will greatly improve performance? Thanks, Alex ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user