Re: [Neo4j] 回复: Fans of Neo4j From Chinese
Hi, You can try to create intermediary nodes that aggregate certain kinds of relationships, i.e. create a abstraction on top of them. This is also used for write heavy scenarios e.g. activity streams with super-nodes which are connected to millions of others - you just introduce a second round of nodes below the supernode, sharded on the properties of either the relationships or the target node, this is to lower write load and also (if your sharding key takes domain read considerations into account you can just go from your initial node to the first subnode and only operate on the relationships from there) Hope that helps Am 22.03.2011 um 02:54 schrieb 孤竹: OK, thanks for you help! It help me a lot! There is another question , In my application, there are lots of nodes and relations(May be million nodes,and ten Thousands relation). I am wonder, I have a method to take relation less,but the nodes will be more( the same ratio ), Is it faster or better for my search ? I think it's faster , because the nodes have index~ Please give me some advices :) -- 原始邮件 -- 发件人: Tobias Ivarssontobias.ivars...@neotechnology.com; 发送时间: 2011年3月19日(星期六) 下午5:59 收件人: Neo4j user discussionsuser@lists.neo4j.org; 主题: Re: [Neo4j] Fans of Neo4j From Chinese Neo4j serializes commits. I.e. at most one thread is committing a transaction at once. For the actual work of building up the data to be committed, Neo4j supports multiple concurrent threads. This fact alone, that there is a single congestion point, means that if an application, like in your case, is very write centric, it is unlikely for it to scale beyond two threads, with one building up the next commit while the other is commiting its data. It might scale to a few more threads than that if the buildup time is significantly larger than the commit time. It is simple time slicing, only one train can be at the station at once, then you have to do the maths on how many trains can be out on the track during that time. It is also worth keeping in mind, that for CPU bound operation, an application doesn't scale much further than the number of CPUs in the computer. The threads that are not in commit mode - i.e. the ones that are building up the data for their next commit - are CPU bound, and contending for the same CPU resources. This means that your application is not going to scale much further than the number of CPUs in your computer, and few desktop/laptop computers have more than 4 CPUs these days, which makes 5 threads about the most you can squeeze out of it, anything more than that is just going to add contention, and possibly even slow things down. Finally, the (CPU bound) threads that create the graph might be contending on the same resources. As Peter said. If multiple threads modify the same node or relationship, i.e. if they create relationships to the same node (the root node for example), they are all going to block on that resource. Neo4j only allows one transaction to modify each entity at a time. This means that to get maximum concurrency out of your data creation, each thread should be creating each own disconnected subgraph. And if they have connected parts, the connections to the global data should be made last in the transaction (in a predictable order to avoid deadlocks[1]), to maximize the time the thread is operational before hitting the congestion point that is the (potentially) contended data. Cheers, Tobias [1] Neo4j will detect if a deadlock has occurred and throw a DeadlockDetectedException in that case. 2011/3/18 孤竹 ho...@foxmail.com hi, Sorry for disturb you , I am a chinese engineer , Excused for my bad english :) . Recently, I am learning Neo4j and trying to use it in my project . But When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I found the nodes inserted to the Neo4J is not change obvious (sometimes not change ~ ~! ). Does it not matter with threads ? the kenerl will make it Serial ? Is there any documents or something about The performance of Neo4j ? thanks for your help The program as follows: I put this function in ExecutorService ,with 5/10/30 threads. then test for the nodes inserted into at same time .(The counts have not changed obviously) Transaction tx = null; Node before = null; try { for (int i = 0; i 100; i++) { if(stop == true){ return; } if (graphDb == null) { return; } try { if (tx == null) { tx = graphDb.beginTx(); }
[Neo4j] 回复: Fans of Neo4j From Chinese
Ok, thx . That's help me a lot. Gtalk: houbo...@gmail.com skype: bolin.hou -- 原始邮件 -- 发件人: Tobias Ivarssontobias.ivars...@neotechnology.com; 发送时间: 2011年3月19日(星期六) 下午5:59 收件人: Neo4j user discussionsuser@lists.neo4j.org; 主题: Re: [Neo4j] Fans of Neo4j From Chinese Neo4j serializes commits. I.e. at most one thread is committing a transaction at once. For the actual work of building up the data to be committed, Neo4j supports multiple concurrent threads. This fact alone, that there is a single congestion point, means that if an application, like in your case, is very write centric, it is unlikely for it to scale beyond two threads, with one building up the next commit while the other is commiting its data. It might scale to a few more threads than that if the buildup time is significantly larger than the commit time. It is simple time slicing, only one train can be at the station at once, then you have to do the maths on how many trains can be out on the track during that time. It is also worth keeping in mind, that for CPU bound operation, an application doesn't scale much further than the number of CPUs in the computer. The threads that are not in commit mode - i.e. the ones that are building up the data for their next commit - are CPU bound, and contending for the same CPU resources. This means that your application is not going to scale much further than the number of CPUs in your computer, and few desktop/laptop computers have more than 4 CPUs these days, which makes 5 threads about the most you can squeeze out of it, anything more than that is just going to add contention, and possibly even slow things down. Finally, the (CPU bound) threads that create the graph might be contending on the same resources. As Peter said. If multiple threads modify the same node or relationship, i.e. if they create relationships to the same node (the root node for example), they are all going to block on that resource. Neo4j only allows one transaction to modify each entity at a time. This means that to get maximum concurrency out of your data creation, each thread should be creating each own disconnected subgraph. And if they have connected parts, the connections to the global data should be made last in the transaction (in a predictable order to avoid deadlocks[1]), to maximize the time the thread is operational before hitting the congestion point that is the (potentially) contended data. Cheers, Tobias [1] Neo4j will detect if a deadlock has occurred and throw a DeadlockDetectedException in that case. 2011/3/18 孤竹 ho...@foxmail.com hi, Sorry for disturb you , I am a chinese engineer , Excused for my bad english :) . Recently, I am learning Neo4j and trying to use it in my project . But When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I found the nodes inserted to the Neo4J is not change obvious (sometimes not change ~ ~! ). Does it not matter with threads ? the kenerl will make it Serial ? Is there any documents or something about The performance of Neo4j ? thanks for your help The program as follows: I put this function in ExecutorService ,with 5/10/30 threads. then test for the nodes inserted into at same time .(The counts have not changed obviously) Transaction tx = null; Node before = null; try { for (int i = 0; i 100; i++) { if(stop == true){ return; } if (graphDb == null) { return; } try { if (tx == null) { tx = graphDb.beginTx(); } // 引用计数加1 writeCount.addAndGet(1); int startNodeString = name.addAndGet(1); Node start = getOrCreateNodeWithOutIndex( + startNodeString); if (before == null) { // 根节点.哈哈哈 I got U Node root = graphDb.getNodeById(0); root.createRelationshipTo(start, LEAD); } if (before != null) { before.createRelationshipTo(start, LOVES); } int endNodeName = name.addAndGet(1); Node end = getOrCreateNodeWithOutIndex( + endNodeName); start.createRelationshipTo(end, KNOWS);
[Neo4j] 回复: Fans of Neo4j From Chinese
OK, thanks for you help! It help me a lot! There is another question , In my application, there are lots of nodes and relations(May be million nodes,and ten Thousands relation). I am wonder, I have a method to take relation less,but the nodes will be more( the same ratio ), Is it faster or better for my search ? I think it's faster , because the nodes have index~ Please give me some advices :) -- 原始邮件 -- 发件人: Tobias Ivarssontobias.ivars...@neotechnology.com; 发送时间: 2011年3月19日(星期六) 下午5:59 收件人: Neo4j user discussionsuser@lists.neo4j.org; 主题: Re: [Neo4j] Fans of Neo4j From Chinese Neo4j serializes commits. I.e. at most one thread is committing a transaction at once. For the actual work of building up the data to be committed, Neo4j supports multiple concurrent threads. This fact alone, that there is a single congestion point, means that if an application, like in your case, is very write centric, it is unlikely for it to scale beyond two threads, with one building up the next commit while the other is commiting its data. It might scale to a few more threads than that if the buildup time is significantly larger than the commit time. It is simple time slicing, only one train can be at the station at once, then you have to do the maths on how many trains can be out on the track during that time. It is also worth keeping in mind, that for CPU bound operation, an application doesn't scale much further than the number of CPUs in the computer. The threads that are not in commit mode - i.e. the ones that are building up the data for their next commit - are CPU bound, and contending for the same CPU resources. This means that your application is not going to scale much further than the number of CPUs in your computer, and few desktop/laptop computers have more than 4 CPUs these days, which makes 5 threads about the most you can squeeze out of it, anything more than that is just going to add contention, and possibly even slow things down. Finally, the (CPU bound) threads that create the graph might be contending on the same resources. As Peter said. If multiple threads modify the same node or relationship, i.e. if they create relationships to the same node (the root node for example), they are all going to block on that resource. Neo4j only allows one transaction to modify each entity at a time. This means that to get maximum concurrency out of your data creation, each thread should be creating each own disconnected subgraph. And if they have connected parts, the connections to the global data should be made last in the transaction (in a predictable order to avoid deadlocks[1]), to maximize the time the thread is operational before hitting the congestion point that is the (potentially) contended data. Cheers, Tobias [1] Neo4j will detect if a deadlock has occurred and throw a DeadlockDetectedException in that case. 2011/3/18 孤竹 ho...@foxmail.com hi, Sorry for disturb you , I am a chinese engineer , Excused for my bad english :) . Recently, I am learning Neo4j and trying to use it in my project . But When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I found the nodes inserted to the Neo4J is not change obvious (sometimes not change ~ ~! ). Does it not matter with threads ? the kenerl will make it Serial ? Is there any documents or something about The performance of Neo4j ? thanks for your help The program as follows: I put this function in ExecutorService ,with 5/10/30 threads. then test for the nodes inserted into at same time .(The counts have not changed obviously) Transaction tx = null; Node before = null; try { for (int i = 0; i 100; i++) { if(stop == true){ return; } if (graphDb == null) { return; } try { if (tx == null) { tx = graphDb.beginTx(); } // 引用计数加1 writeCount.addAndGet(1); int startNodeString = name.addAndGet(1); Node start = getOrCreateNodeWithOutIndex( + startNodeString); if (before == null) { // 根节点.哈哈哈 I got U Node root = graphDb.getNodeById(0); root.createRelationshipTo(start, LEAD); } if (before != null) { before.createRelationshipTo(start,