[Neo4j] 回复: Fans of Neo4j From Chinese

2011-03-21 Thread 孤竹
Ok, thx . That's help me a lot.




Gtalk:  houbo...@gmail.com
skype: bolin.hou
 
-- 原始邮件 --
发件人: Tobias Ivarssontobias.ivars...@neotechnology.com;
发送时间: 2011年3月19日(星期六) 下午5:59
收件人: Neo4j user discussionsuser@lists.neo4j.org; 

主题: Re: [Neo4j] Fans of Neo4j From Chinese

 
 Neo4j serializes commits. I.e. at most one thread is committing a
transaction at once.
For the actual work of building up the data to be committed, Neo4j supports
multiple concurrent threads.

This fact alone, that there is a single congestion point, means that if an
application, like in your case, is very write centric, it is unlikely for it
to scale beyond two threads, with one building up the next commit while the
other is commiting its data. It might scale to a few more threads than that
if the buildup time is significantly larger than the commit time. It is
simple time slicing, only one train can be at the station at once, then
you have to do the maths on how many trains can be out on the track during
that time.

It is also worth keeping in mind, that for CPU bound operation, an
application doesn't scale much further than the number of CPUs in the
computer. The threads that are not in commit mode - i.e. the ones that are
building up the data for their next commit - are CPU bound, and contending
for the same CPU resources. This means that your application is not going to
scale much further than the number of CPUs in your computer, and few
desktop/laptop computers have more than 4 CPUs these days, which makes 5
threads about the most you can squeeze out of it, anything more than that is
just going to add contention, and possibly even slow things down.

Finally, the (CPU bound) threads that create the graph might be contending
on the same resources. As Peter said. If multiple threads modify the same
node or relationship, i.e. if they create relationships to the same node
(the root node for example), they are all going to block on that resource.
Neo4j only allows one transaction to modify each entity at a time. This
means that to get maximum concurrency out of your data creation, each thread
should be creating each own disconnected subgraph. And if they have
connected parts, the connections to the global data should be made last in
the transaction (in a predictable order to avoid deadlocks[1]), to maximize
the time the thread is operational before hitting the
congestion point that is the (potentially) contended data.

Cheers,
Tobias

[1] Neo4j will detect if a deadlock has occurred and throw a
DeadlockDetectedException in that case.

2011/3/18 孤竹 ho...@foxmail.com

 hi,


   Sorry for disturb you , I am a chinese engineer , Excused for my bad
 english :) .


   Recently, I am learning Neo4j and trying to use it in my project . But
 When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I
 found the nodes inserted to the Neo4J is not change obvious (sometimes not
 change ~ ~! ). Does it not matter with threads ? the kenerl will make it
 Serial ? Is there any documents or something about The performance of Neo4j
 ? thanks for your help



   The program as follows:
   I put this function in ExecutorService ,with 5/10/30 threads. then test
 for the nodes inserted into at same time .(The counts have not changed
 obviously)


 Transaction tx = null;
Node before = null;
try {
for (int i = 0; i  100; i++) {
if(stop == true){
return;
}
if (graphDb == null) {
return;
}
try {
if (tx == null) {
tx = graphDb.beginTx();
}
// 引用计数加1
writeCount.addAndGet(1);
int startNodeString =
 name.addAndGet(1);
Node start =
 getOrCreateNodeWithOutIndex(
+ startNodeString);
if (before == null) {
// 根节点.哈哈哈 I got U
Node root =
 graphDb.getNodeById(0);

  root.createRelationshipTo(start, LEAD);
}
if (before != null) {

  before.createRelationshipTo(start, LOVES);
}
int endNodeName = name.addAndGet(1);
Node end =
 getOrCreateNodeWithOutIndex( + endNodeName);
start.createRelationshipTo(end,
 KNOWS);
 

Re: [Neo4j] Finding a Path Between Nodes (filtered by relationship property)

2011-03-21 Thread Max De Marzi Jr.
Add Something like:

return filter: { language: javascript,  body:
position.lastRelationship().hasProperty(\userGroupId\)
position.lastRelationship().getProperty(\userGroupId\) == 111;}})

to your traversal.

On Mon, Mar 21, 2011 at 9:44 AM, Kevin Dieter kevin.die...@megree.com wrote:
 Hi,

 I am using the REST API from a .Net application and have a need to find
 paths between nodes and I would like to include or exclude relationships
 based on a property value.

 For example:

   1. Node1 has an outgoing relationship of type Friend, with relationship
   property userGroupId= 111 to Node2
   2. Node2 has an outgoing relationship of type Family, with relationship
   property userGroupId= 111 to Node3
   3. Node1 has an outgoing relationship of type WorkedWith, with
   relationship property userGroupId= 222 to Node3


 I would like to find paths from Node1 to Node3 using relationships of any
 type, but only using those relationships with userGroupId=111.  This should
 return the path that includes Node2 and the first two relationships, but
 should not include the direct path that uses the third relationship.

 Is this possible using the REST API?

 Thanks,

 Kevin
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Graph traversal doubt

2011-03-21 Thread Marko Rodriguez
Hi,

In Gremlin do,

g.v(torontoId).inE('traveled_to').outV.outE('traveled_to').inV 

If you want it ranked by frequency, do:

m = [:];

g.v(torontoId).inE('traveled_to').outV.outE('traveled_to').inV.groupCount(m)

Take care,
Marko.

http://markorodriguez.com



On Mar 20, 2011, at 11:52 PM, Peter Neubauer wrote:

 Adriano,
 how about something like this?
 
 
 import org.junit.Test;
 import org.neo4j.graphdb.Node;
 import org.neo4j.graphdb.Path;
 import org.neo4j.graphdb.traversal.Evaluation;
 import org.neo4j.graphdb.traversal.Evaluator;
 import org.neo4j.graphdb.traversal.TraversalDescription;
 import org.neo4j.kernel.Traversal;
 
 import common.Neo4jAlgoTestCase;
 
 
 public class TraversalTest extends Neo4jAlgoTestCase
 {
 
@Test public void test2Steps() {
graph.makeEdge(John, Paris);
graph.makeEdge(Peter, Paris);
graph.makeEdge(John, Rome);
graph.makeEdge(Peter, Toronto);
graph.makeEdge(Adriano, Toronto);
graph.makeEdge(Adriano, Tokyo);
Node node = graph.getNode( Toronto );
TraversalDescription td = Traversal.description().evaluator(
 new Evaluator()
{
 
@Override
public Evaluation evaluate( Path path )
{
if (path.length() == 2) {
return Evaluation.INCLUDE_AND_PRUNE;
}
return Evaluation.EXCLUDE_AND_CONTINUE;
}
});
for (Node res : td.traverse( node ).nodes()){
System.out.println(res);
}
 
}
 }
 
 
 Cheers,
 
 /peter neubauer
 
 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer
 
 http://www.neo4j.org   - Your high performance graph database.
 http://startupbootcamp.org/- Öresund - Innovation happens HERE.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.
 
 
 
 On Sun, Mar 20, 2011 at 10:32 PM, Adriano Henrique de Almeida
 adrianoalmei...@gmail.com wrote:
 Hi,
 
 I have the following attached graph where I have persons who traveled to
 some cities.
 
 What I want to find out is, for a given city, for instance Toronto, the
 ones who traveled there, also traveled to these other cities (in the
 attached graph are Tokyo (by Adriano) and Paris (by Peter)).
 
 To retrieve this information, I did the following code:
 
   CollectionNode allNodes = new ArrayListNode();
 
 Node toronto = db.getNodeById(torontoId); // First I get Toronto node and
 its relationships to know who traveled there
 
 IterableRelationship relationships =
 toronto.getRelationships(Relationships.TRAVELED_TO, Direction.INCOMING);
 
 
 for (Relationship relationship : relationships) {
 
Node[] nodes = relationship.getNodes(); // For each relationship found,
 I all nodes that somehow is related to this relationship
 
for (Node node : nodes) {
 
CollectionNode citiesNode = node.traverse(Order.DEPTH_FIRST,
 StopEvaluator.DEPTH_ONE, ReturnableEvaluator.ALL_BUT_START_NODE,
 Relationships.TRAVELED_TO, Direction.OUTGOING).getAllNodes(); // And finally
 I traverse the graph to find to find from theses nodes where the other
 people traveled to
 
 allNodes.addAll(citiesNode);
 
}
 
 }
 
 Well, with this I can get the results I wanted, however, it seemed to me
 that what I did was too complicated :) . So, my question is: is there any
 way to do this traversal in a more straightforward manner?.
 
 Thanks in advance.
 
 --
 Adriano Almeida
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] Possible performance regression issue?

2011-03-21 Thread Rick Bullotta
Here's the quick summary of what we're encountering:

We are inserting large numbers of activity stream entries on a nearly constant 
basis.  To optimize transactioning, we queue these up and have a single 
scheduled task that reads the entries from the queue and persists them to Neo.  
Within these transactions, it's possible that a very large number of 
relationships will be created and deleted (sometimes create and deleted all 
within the transaction, since we are managing something similar to an index).   
I've noticed that the time required to handle the inserts (not just the total, 
but the time per insert) degrades DRAMATICALLY if there are more than a few 
hundred entries to write.  It is very fast if there are  100 entries in the 
batch, but very slow if there are over  1000.  With Neo 1.1, we did not notice 
this behavior.  We have tried Neo 1.2 and 1.3 and both seem to exhibit this 
behavior.

Can anyone provide any insight into possible causes/fixes?

Thanks,

Rick

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] New Blog post: Strategies for Scaling Neo4j

2011-03-21 Thread Jim Webber
With especial thanks to Mark Harwood and Alex Averbuch, I wrote this on 
approaches for scaling:

http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx

Your thoughts would be most welcome.

Jim

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Fans of Neo4j From Chinese

2011-03-21 Thread Rick Bullotta
I'd like to explore this question a bit further.  Does this mean that basically 
there's no way to scale beyond a single thread/CPU for disconnected graphs if 
you have complex graph dependencies (e.g. you cannot create disjoint subgraphs)?



-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of Tobias Ivarsson
Sent: Saturday, March 19, 2011 5:59 AM
To: Neo4j user discussions
Subject: Re: [Neo4j] Fans of Neo4j From Chinese

Neo4j serializes commits. I.e. at most one thread is committing a
transaction at once.
For the actual work of building up the data to be committed, Neo4j supports
multiple concurrent threads.

This fact alone, that there is a single congestion point, means that if an
application, like in your case, is very write centric, it is unlikely for it
to scale beyond two threads, with one building up the next commit while the
other is commiting its data. It might scale to a few more threads than that
if the buildup time is significantly larger than the commit time. It is
simple time slicing, only one train can be at the station at once, then
you have to do the maths on how many trains can be out on the track during
that time.

It is also worth keeping in mind, that for CPU bound operation, an
application doesn't scale much further than the number of CPUs in the
computer. The threads that are not in commit mode - i.e. the ones that are
building up the data for their next commit - are CPU bound, and contending
for the same CPU resources. This means that your application is not going to
scale much further than the number of CPUs in your computer, and few
desktop/laptop computers have more than 4 CPUs these days, which makes 5
threads about the most you can squeeze out of it, anything more than that is
just going to add contention, and possibly even slow things down.

Finally, the (CPU bound) threads that create the graph might be contending
on the same resources. As Peter said. If multiple threads modify the same
node or relationship, i.e. if they create relationships to the same node
(the root node for example), they are all going to block on that resource.
Neo4j only allows one transaction to modify each entity at a time. This
means that to get maximum concurrency out of your data creation, each thread
should be creating each own disconnected subgraph. And if they have
connected parts, the connections to the global data should be made last in
the transaction (in a predictable order to avoid deadlocks[1]), to maximize
the time the thread is operational before hitting the
congestion point that is the (potentially) contended data.

Cheers,
Tobias

[1] Neo4j will detect if a deadlock has occurred and throw a
DeadlockDetectedException in that case.

2011/3/18 孤竹 ho...@foxmail.com

 hi,


   Sorry for disturb you , I am a chinese engineer , Excused for my bad
 english :) .


   Recently, I am learning Neo4j and trying to use it in my project . But
 When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I
 found the nodes inserted to the Neo4J is not change obvious (sometimes not
 change ~ ~! ). Does it not matter with threads ? the kenerl will make it
 Serial ? Is there any documents or something about The performance of Neo4j
 ? thanks for your help



   The program as follows:
   I put this function in ExecutorService ,with 5/10/30 threads. then test
 for the nodes inserted into at same time .(The counts have not changed
 obviously)


 Transaction tx = null;
Node before = null;
try {
for (int i = 0; i  100; i++) {
if(stop == true){
return;
}
if (graphDb == null) {
return;
}
try {
if (tx == null) {
tx = graphDb.beginTx();
}
// 引用计数加1
writeCount.addAndGet(1);
int startNodeString =
 name.addAndGet(1);
Node start =
 getOrCreateNodeWithOutIndex(
+ startNodeString);
if (before == null) {
// 根节点.哈哈哈 I got U
Node root =
 graphDb.getNodeById(0);

  root.createRelationshipTo(start, LEAD);
}
if (before != null) {

  before.createRelationshipTo(start, LOVES);
}
int endNodeName = name.addAndGet(1);
 

Re: [Neo4j] New Blog post: Strategies for Scaling Neo4j

2011-03-21 Thread Emil Eifrem
Great post. Only thing I'd add is that a weakness of 1  2 is that
while they scale ~linearly for reads, they don't scale writes. Maybe
that's obvious but it may be worth pointing out anyway.

Cheers,

-EE

On Mon, Mar 21, 2011 at 17:47, Jim Webber j...@neotechnology.com wrote:
 With especial thanks to Mark Harwood and Alex Averbuch, I wrote this on 
 approaches for scaling:

 http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx

 Your thoughts would be most welcome.

 Jim

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Emil Eifrém, CEO [e...@neotechnology.com]
Neo Technology, www.neotechnology.com
Cell: +46 733 462 271 | US: 206 403 8808
http://blogs.neotechnology.com/emil
http://twitter.com/emileifrem
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Neo4j and Microsoft Azure

2011-03-21 Thread Javier de la Rosa
On Mon, Feb 14, 2011 at 17:15, Peter Neubauer
peter.neuba...@neotechnology.com wrote:
 Hi all Graphytes,
 there has been a lot of interest in Neo4j fro the Microsoft side of
 things, so Magnus Mårtensson and me did write-up o how to get a first
 version of a Neo4j Server hosted on Microsoft Azure. Enjoy, and as
 always feel free to feedback to the community!

 http://blog.neo4j.org/2011/02/announcing-neo4j-on-windows-azure.html

Great post indeed. But recently I read about Microsoft Trinity [1],
what are your opinions about that? Will it be a competitor for Neo4j?
Do you think it would be a good idea to support hyperedges in a native
way in the future Neo4j releases?

Best regards.



[1] http://research.microsoft.com/en-us/projects/trinity/default.aspx



-- 
Javier de la Rosa
http://versae.es
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] New Blog post: Strategies for Scaling Neo4j

2011-03-21 Thread Jim Webber
Duly updated, thanks for the feedback.

Jim

On 22 Mar 2011, at 00:56, Emil Eifrem wrote:

 Great post. Only thing I'd add is that a weakness of 1  2 is that
 while they scale ~linearly for reads, they don't scale writes. Maybe
 that's obvious but it may be worth pointing out anyway.
 
 Cheers,
 
 -EE
 
 On Mon, Mar 21, 2011 at 17:47, Jim Webber j...@neotechnology.com wrote:
 With especial thanks to Mark Harwood and Alex Averbuch, I wrote this on 
 approaches for scaling:
 
 http://jim.webber.name/2011/03/22/ef4748c3-6459-40b6-bcfa-818960150e0f.aspx
 
 Your thoughts would be most welcome.
 
 Jim
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
 
 
 
 
 -- 
 Emil Eifrém, CEO [e...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 Cell: +46 733 462 271 | US: 206 403 8808
 http://blogs.neotechnology.com/emil
 http://twitter.com/emileifrem

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo4j] 回复: Fans of Neo4j From Chinese

2011-03-21 Thread 孤竹
OK, thanks for you help! It help me a lot!


There is another question , In my application, there are lots of nodes and 
relations(May be  million nodes,and ten Thousands relation). I am wonder, I 
have a method to take relation less,but the nodes will be more( the same ratio 
), Is it faster or better for my search ? I think it's faster , because the 
nodes  have index~ Please give me some advices :)
 
-- 原始邮件 --
发件人: Tobias Ivarssontobias.ivars...@neotechnology.com;
发送时间: 2011年3月19日(星期六) 下午5:59
收件人: Neo4j user discussionsuser@lists.neo4j.org; 

主题: Re: [Neo4j] Fans of Neo4j From Chinese

 
 Neo4j serializes commits. I.e. at most one thread is committing a
transaction at once.
For the actual work of building up the data to be committed, Neo4j supports
multiple concurrent threads.

This fact alone, that there is a single congestion point, means that if an
application, like in your case, is very write centric, it is unlikely for it
to scale beyond two threads, with one building up the next commit while the
other is commiting its data. It might scale to a few more threads than that
if the buildup time is significantly larger than the commit time. It is
simple time slicing, only one train can be at the station at once, then
you have to do the maths on how many trains can be out on the track during
that time.

It is also worth keeping in mind, that for CPU bound operation, an
application doesn't scale much further than the number of CPUs in the
computer. The threads that are not in commit mode - i.e. the ones that are
building up the data for their next commit - are CPU bound, and contending
for the same CPU resources. This means that your application is not going to
scale much further than the number of CPUs in your computer, and few
desktop/laptop computers have more than 4 CPUs these days, which makes 5
threads about the most you can squeeze out of it, anything more than that is
just going to add contention, and possibly even slow things down.

Finally, the (CPU bound) threads that create the graph might be contending
on the same resources. As Peter said. If multiple threads modify the same
node or relationship, i.e. if they create relationships to the same node
(the root node for example), they are all going to block on that resource.
Neo4j only allows one transaction to modify each entity at a time. This
means that to get maximum concurrency out of your data creation, each thread
should be creating each own disconnected subgraph. And if they have
connected parts, the connections to the global data should be made last in
the transaction (in a predictable order to avoid deadlocks[1]), to maximize
the time the thread is operational before hitting the
congestion point that is the (potentially) contended data.

Cheers,
Tobias

[1] Neo4j will detect if a deadlock has occurred and throw a
DeadlockDetectedException in that case.

2011/3/18 孤竹 ho...@foxmail.com

 hi,


   Sorry for disturb you , I am a chinese engineer , Excused for my bad
 english :) .


   Recently, I am learning Neo4j and trying to use it in my project . But
 When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I
 found the nodes inserted to the Neo4J is not change obvious (sometimes not
 change ~ ~! ). Does it not matter with threads ? the kenerl will make it
 Serial ? Is there any documents or something about The performance of Neo4j
 ? thanks for your help



   The program as follows:
   I put this function in ExecutorService ,with 5/10/30 threads. then test
 for the nodes inserted into at same time .(The counts have not changed
 obviously)


 Transaction tx = null;
Node before = null;
try {
for (int i = 0; i  100; i++) {
if(stop == true){
return;
}
if (graphDb == null) {
return;
}
try {
if (tx == null) {
tx = graphDb.beginTx();
}
// 引用计数加1
writeCount.addAndGet(1);
int startNodeString =
 name.addAndGet(1);
Node start =
 getOrCreateNodeWithOutIndex(
+ startNodeString);
if (before == null) {
// 根节点.哈哈哈 I got U
Node root =
 graphDb.getNodeById(0);

  root.createRelationshipTo(start, LEAD);
}
if (before != null) {

  before.createRelationshipTo(start, 

[Neo4j] 回复: 回复: Fans of Neo4j From Chinese

2011-03-21 Thread 孤竹
Sorry, I have not take it clear, my node have many relations, the relations 
will be more and more than the nodes, I say ten thousands relations is one node 
connect to another :)
 
 
-- 原始邮件 --
发件人: Rick Bullottarick.bullo...@thingworx.com;
发送时间: 2011年3月22日(星期二) 上午9:55
收件人: Neo4j user discussionsuser@lists.neo4j.org; 

主题: Re: [Neo4j]回复:  Fans of Neo4j From Chinese

 
 I will be surprised if you do not have at least as many relationships as nodes 
(since usually each node is connected to at least one other node).


-Original Message-
From: user-boun...@lists.neo4j.org [mailto:user-boun...@lists.neo4j.org] On 
Behalf Of ??
Sent: Monday, March 21, 2011 9:55 PM
To: Neo4j user discussions
Subject: [Neo4j] 回复: Fans of Neo4j From Chinese

OK, thanks for you help! It help me a lot!


There is another question , In my application, there are lots of nodes and 
relations(May be  million nodes,and ten Thousands relation). I am wonder, I 
have a method to take relation less,but the nodes will be more( the same ratio 
), Is it faster or better for my search ? I think it's faster , because the 
nodes  have index~ Please give me some advices :)
 
-- 原始邮件 --
发件人: Tobias Ivarssontobias.ivars...@neotechnology.com;
发送时间: 2011年3月19日(星期六) 下午5:59
收件人: Neo4j user discussionsuser@lists.neo4j.org; 

主题: Re: [Neo4j] Fans of Neo4j From Chinese

 
 Neo4j serializes commits. I.e. at most one thread is committing a
transaction at once.
For the actual work of building up the data to be committed, Neo4j supports
multiple concurrent threads.

This fact alone, that there is a single congestion point, means that if an
application, like in your case, is very write centric, it is unlikely for it
to scale beyond two threads, with one building up the next commit while the
other is commiting its data. It might scale to a few more threads than that
if the buildup time is significantly larger than the commit time. It is
simple time slicing, only one train can be at the station at once, then
you have to do the maths on how many trains can be out on the track during
that time.

It is also worth keeping in mind, that for CPU bound operation, an
application doesn't scale much further than the number of CPUs in the
computer. The threads that are not in commit mode - i.e. the ones that are
building up the data for their next commit - are CPU bound, and contending
for the same CPU resources. This means that your application is not going to
scale much further than the number of CPUs in your computer, and few
desktop/laptop computers have more than 4 CPUs these days, which makes 5
threads about the most you can squeeze out of it, anything more than that is
just going to add contention, and possibly even slow things down.

Finally, the (CPU bound) threads that create the graph might be contending
on the same resources. As Peter said. If multiple threads modify the same
node or relationship, i.e. if they create relationships to the same node
(the root node for example), they are all going to block on that resource.
Neo4j only allows one transaction to modify each entity at a time. This
means that to get maximum concurrency out of your data creation, each thread
should be creating each own disconnected subgraph. And if they have
connected parts, the connections to the global data should be made last in
the transaction (in a predictable order to avoid deadlocks[1]), to maximize
the time the thread is operational before hitting the
congestion point that is the (potentially) contended data.

Cheers,
Tobias

[1] Neo4j will detect if a deadlock has occurred and throw a
DeadlockDetectedException in that case.

2011/3/18 孤竹 ho...@foxmail.com

 hi,


   Sorry for disturb you , I am a chinese engineer , Excused for my bad
 english :) .


   Recently, I am learning Neo4j and trying to use it in my project . But
 When I make a Pressure on neo4j with 5 theads , 10 theads, 20 and 30, I
 found the nodes inserted to the Neo4J is not change obvious (sometimes not
 change ~ ~! ). Does it not matter with threads ? the kenerl will make it
 Serial ? Is there any documents or something about The performance of Neo4j
 ? thanks for your help



   The program as follows:
   I put this function in ExecutorService ,with 5/10/30 threads. then test
 for the nodes inserted into at same time .(The counts have not changed
 obviously)


 Transaction tx = null;
Node before = null;
try {
for (int i = 0; i  100; i++) {
if(stop == true){
return;
}
if (graphDb == null) {
return;
}
try {
if (tx == null) {
tx = graphDb.beginTx();
  

Re: [Neo4j] Performance expectations for Neo4j.

2011-03-21 Thread Peter Neubauer
Bård,
sorry, this seems to have slipped through the list. I think you should
be able to attach a picture to the list. Let me know if you have
problems with that!

Cheers,

/peter neubauer

GTalk:      neubauer.peter
Skype       peter.neubauer
Phone       +46 704 106975
LinkedIn   http://www.linkedin.com/in/neubauer
Twitter      http://twitter.com/peterneubauer

http://www.neo4j.org               - Your high performance graph database.
http://startupbootcamp.org/    - Öresund - Innovation happens HERE.
http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



On Tue, Mar 8, 2011 at 12:47 PM, Bård Lind bard.l...@gmail.com wrote:
 Hi Peter and David.

 Thank you very much for your replies. Following your input I have run
 som more tests, with -Xmx for memory, and repeating tests.
 Each test is returning 170138 nodes.

 Single run on laptop, with encrypted SSD disk:
 java -jar -Xmx128m  graphdb-1.0.SNAPSHOT.jar
 7481.0 ms
 java -jar -Xmx1024m  graphdb-1.0.SNAPSHOT.jar
 5545.0 ms
 - were not able to assign 2GB memory.

 Single run on ultra-fast Ubuntu:
 java -jar -Xmx1024m  graphdb-1.0.SNAPSHOT.jar
 2795.0 ms.
 java -jar -Xmx2048m  graphdb-1.0.SNAPSHOT.jar
 2734.0 ms

 Repeated runs on laptop:
 java -jar -Xmx1024m  graphdb-1.0.SNAPSHOT.jar
 1.st run: 5422.0 ms
 2.nd run: 1328.0 ms
 3.rd run: 1031.0 ms

 Repeated runs on Ubuntu:
 java -jar -Xmx2048m  graphdb-1.0.SNAPSHOT.jar
 1.st run: 2797.0 ms
 2.nd run:  573.0 ms
 3.rd run:  448.0 ms

 These tests indicate that IO/disk speed, possibly and CPU is crucial
 for first fetch, while secondary retrievals are ok performance wise.
 What do you think about these results, David and Peter?

 The Scenario I'm trying to prove: Show a tree structure of a Customer,
 with sub-companies, accounts and subscriptions.
 1. Using a single user id, find all the resources a user_id has access to.
 2. Show these resources in a tree structure with parameters including
 id, parent_id, type and name.
 3. Use the tree for navigation client-side.
 The resources are Customer (which can have Customer children), Account
 (child of Customer), Subscription (child of Account).
 Normally a user access starts at a single customer, or account level.
 Then has read, write and inherit access to all children.
 Different users do have access to different parts of the full Customer-tree.

 May I send a picture, describing the scenario, to this list, or must I
 post the photo to different location, and only send the link?

 The code I use is:
 public CollectionNode findGraphBasedOnPrincipal(long userId) {

        CollectionNode nodeList = new LinkedHashSetNode();
        Monitor monFind= MonitorFactory.start();
        final TraversalDescription PRINCIPAL_TRAVERSAL = 
 Traversal.description()

 .relationships(RelationshipTypeTelenor.IS_MEMBER_OF_GROUP,
 Direction.OUTGOING)
                .relationships(RelationshipTypeTelenor.SECURITY,
 Direction.OUTGOING)
                .relationships(
 RelationshipTypeTelenor.IS_CHILD_RESOURCE_OF, Direction.INCOMING )
                .depthFirst()
                .evaluator(new Evaluator() {
                    @Override
                    public Evaluation evaluate(Path path) {
                        if (path.endNode().getId() == 1) {
                            return Evaluation.EXCLUDE_AND_PRUNE;
                        }
                        return Evaluation.INCLUDE_AND_CONTINUE;
                    }
                })
                .uniqueness(Uniqueness.RELATIONSHIP_GLOBAL);
        Transaction tx = graphDb.beginTx();
        try {
            Node startNode = graphDb.getNodeById(userId);

            Monitor monTrav = MonitorFactory.start();
            IterableNode nodes =
 PRINCIPAL_TRAVERSAL.traverse(startNode).nodes();
            IteratorNode iter = nodes.iterator();
            int count = 0;
            while (iter.hasNext()) {
                Node next = iter.next();
                count ++;
            }
            monTrav.stop();
            System.out.println(INFO: Fetch of  + count +  nodes in:
  + monTrav); -- This is where I stop the timer regarding Neo4J
 performance.
            for (Node node : nodes) {
                nodeList.add(node);
            }
            tx.success();
        } finally {
            tx.finish();
        }
        return nodeList;
    }


 Scenario 2: Has user x access to resource y
 Here we think of using something like the the ACL example on the Neo4J
 wiki, which does not include loading the full customer-graph.

 Hope this clarifies my challenge a bit.

 Looking forward to your suggestions :-)

 Big smile from Bård






 On Mon, Mar 7, 2011 at 5:31 PM, David Montag
 david.mon...@neotechnology.com wrote:
 Bård,
 Great to hear you're evaluating us for your solution.
 I have a couple of questions. First, how much RAM do you have in the
 machine, and how much heap are you allocating for the Java process? Peter's
 question about running it multiple times is also very relevant.
 Secondly,