Re: [Neo] memory mapping

2010-05-24 Thread Johan Svensson
If a run is that long when performing traversals -server flag should
be faster.

Could you explain a bit more what type of traversal you are performing
and what the graph looks like? Judging by the size of the store files
you should be able to traverse the full graph many times in a single
day on that machine.

-Johan

On Fri, May 21, 2010 at 1:15 PM, Lorenzo Livi lorenz.l...@gmail.com wrote:
 No, I use only one jvm instance for each run.
 My run usually last something like 1 day or 15 days.

 On Fri, May 21, 2010 at 1:10 PM, Johan Svensson jo...@neotechnology.com 
 wrote:
 Yes, -server is usually slower the first few runs but once some
 information has been gathered and optimizations put in place it will
 be faster.

 Are you starting a new JVM for each traversal?
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Newbie question

2010-05-24 Thread Peter Neubauer
Thomas,
sorry for taking so long to answer. Welcome to Neo4j!

On Fri, May 21, 2010 at 10:37 PM, Thomas Sant'ana maill...@gmail.com wrote:
 1)  In the Matrix example (I saw it in several videos), what whould be the
 proper way to query: All people that Neo knows and love him?' In this case
 it's a single hops, so I think the way is to iterate through all love
 relations and see if the come back to Neo. But if we have more hops would we
 need to have a traversal in a traversal?
In this case, one approach would be to go along the KNOWS
relationships of Neo (probably later in several hops) and check if
there is an outgoing LOVES relationship back to Neo, something like
http://gist.github.com/411695 (where Trinity doesn't know Neo so we go
2 KNOWS hops deep.



 2) Is there a simple way to know how many edges/relation of a certain type
 come out of a node? I figure I can iterate throw the relations, but I was
 thinking of using this to choose the cheapest starting node for a traversal.
Not really, since there are no global measures stored per se and
almost all data is loaded lazy. However, if you want that, you could
store the number of relationships when you operate on that node in an
index (e.g. Lucene). Who useful that is regarding update performance
and keeping things in sync depends on your usecase.


 3) Let say I have a Car graph, with:

 ReferenceNode (RN) -- Cars
 RN ---  Manufactures -- Ford / GM/ SAAB/ Volvo etc.
 RN ---  Colors --- Grey, Silver, 
 RN --- MakeYear --- 2000, 2001, 2002

 A given car has a relaction to a Manufactures, Color, and Make year:

 Cars --- aCar
 aCar -- Silver
 aCar --- 2000
 aCar --- SAAB

 How can I get all the 2000 - 2001, Silver, SAAB?

I have made the basic structure in http://gist.github.com/411699 .
Now, the question is about how best to optimize set operations between
the different criteria. You could start at the year_2000 node (finding
it with an index lookup for the year 2000, to be added to the code) as
I did and iterate through all car nodes, and return everything that
has a COLOR connection to the Silver node, and a MANUFACTURER
connection to SAAB. However, having multiple years, you even could add
an index on top of e.g. the year nodes in order to be able to
effectively select multiple years. Do you have some more info on the
dataset sizes of your domain so I can flesh the GIST out a bit with
your search?

/peter
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] neo4j performance issue help required

2010-05-24 Thread Mattias Persson
Hi,

To use the Dijkstra algorithm to find all the paths between two nodes
is a bit overkill. You should instead use the AllSimplePaths class
which is optimized to find all the paths between two nodes
(irregardless of weight).

If you instead feel like updating to the latest neo4j kernel
(1.1-SNAPSHOT) and graph-algo package (0.6-SNAPSHOT) the
AllSimplePaths has been reimplemented using the new traversal
framework and has a better interface, including a Path abstraction.
Although I'm uncertain about the performance difference between the
0.3 version and the 0.6-SNAPSHOT version.

2010/5/21 Maaz Bin Tariq maaz.ta...@yahoo.com:
 Hello Everyone,

 I have been using neo4j for couple of months, but recently it is giving me 
 performance issues while getting all path between two nodes.  I have tried 
 all the configuration for performance improvement but was unsuccessful. The 
 number of Nodes is about 35k  1 property per node and number of relationship 
 is about 100K. we are using Index on each node. The system load reached up to 
 8 to 9 during peak time. I have run JProfiler and below code seem to be the 
 problem and taking much cpu util. Please let me know if i am doing something 
 silly. We access this method continuously during heavy load and that can be 
 every second.
 -
         DijkstraInteger dijkstra = new DijkstraInteger(0, start, end, new 
 CostEvaluatorInteger() {
     public Integer getCost(Relationship relationship, boolean 
 backwards) {
         // unweighted graph
     return 1;
     }
     }, new IntegerAdder(), new IntegerComparator(), Direction.BOTH, 
 SocialNetworkRelationshipType.KNOWS);

         dijkstra.limitMaxCostToTraverse(3);

     dijkstra.getPathsAsNodes();
 -

 Currently the system configuration was RAM 2 GB , 1 GB is allocated to JVM 
 heap and 512 MB to memory mapping. JVM running with VM arrguments-server and 
 -XX:+UseConcMarkSweepGC.

 Following are the neo4j jars we are using
 neo4j-graph-algo-0.3.jar
 neo4j-index-1.0.jar
 neo4j-kernel-1.0.jar
 neo4j-shell-1.0.jar
 neo4j-commons-1.0.jar
 lucene-core-2.9.1.jar

 After moving the neo4j to a new instance of 16 GB RAM , 5 GB allocated to JVM 
 heap and 2.5 GB  to memory mapping still give a system load up to 3.

 Kindly suggest whats can be done to optimize neo4j ?

 Thanks
 -Maaz Tariq




 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Newbie question

2010-05-24 Thread Mattias Persson
2010/5/24 Peter Neubauer peter.neuba...@neotechnology.com:
 Thomas,
 sorry for taking so long to answer. Welcome to Neo4j!

 On Fri, May 21, 2010 at 10:37 PM, Thomas Sant'ana maill...@gmail.com wrote:
 1)  In the Matrix example (I saw it in several videos), what whould be the
 proper way to query: All people that Neo knows and love him?' In this case
 it's a single hops, so I think the way is to iterate through all love
 relations and see if the come back to Neo. But if we have more hops would we
 need to have a traversal in a traversal?
 In this case, one approach would be to go along the KNOWS
 relationships of Neo (probably later in several hops) and check if
 there is an outgoing LOVES relationship back to Neo, something like
 http://gist.github.com/411695 (where Trinity doesn't know Neo so we go
 2 KNOWS hops deep.



 2) Is there a simple way to know how many edges/relation of a certain type
 come out of a node? I figure I can iterate throw the relations, but I was
 thinking of using this to choose the cheapest starting node for a traversal.
 Not really, since there are no global measures stored per se and
 almost all data is loaded lazy. However, if you want that, you could
 store the number of relationships when you operate on that node in an
 index (e.g. Lucene). Who useful that is regarding update performance
 and keeping things in sync depends on your usecase.
Or you could store that number on the node directly, which to me seems
like a better place.


 3) Let say I have a Car graph, with:

 ReferenceNode (RN) -- Cars
 RN ---  Manufactures -- Ford / GM/ SAAB/ Volvo etc.
 RN ---  Colors --- Grey, Silver, 
 RN --- MakeYear --- 2000, 2001, 2002

 A given car has a relaction to a Manufactures, Color, and Make year:

 Cars --- aCar
 aCar -- Silver
 aCar --- 2000
 aCar --- SAAB

 How can I get all the 2000 - 2001, Silver, SAAB?

 I have made the basic structure in http://gist.github.com/411699 .
 Now, the question is about how best to optimize set operations between
 the different criteria. You could start at the year_2000 node (finding
 it with an index lookup for the year 2000, to be added to the code) as
 I did and iterate through all car nodes, and return everything that
 has a COLOR connection to the Silver node, and a MANUFACTURER
 connection to SAAB. However, having multiple years, you even could add
 an index on top of e.g. the year nodes in order to be able to
 effectively select multiple years. Do you have some more info on the
 dataset sizes of your domain so I can flesh the GIST out a bit with
 your search?

 /peter
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] memory mapping

2010-05-24 Thread Lorenzo Livi
Hi,
I'm using many serial custom DFS (not the neo4j built-in), limited in
depth, for each run. Then all the successive computations are in RAM
(eigenvector determination etc...). The big bottleneck is obviously
the access to the disk. The nodes of the graph have a high out-degree,
then the typical DFS in impegnative ...

Just a question: I'm using the read-only access mode to the graph-db.
When I try to map everything (just like u suggested) in the properties
serction I get many warning (cannot memory map ...). If i use lower
memory for this section (the half) everything works fine (without
warning). Why? It should always give me this warning ...

Best regards,
Lorenzo


On Mon, May 24, 2010 at 11:41 AM, Johan Svensson
jo...@neotechnology.com wrote:
 If a run is that long when performing traversals -server flag should
 be faster.

 Could you explain a bit more what type of traversal you are performing
 and what the graph looks like? Judging by the size of the store files
 you should be able to traverse the full graph many times in a single
 day on that machine.

 -Johan

___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Fwd: Node not in use exception when using tx event handler

2010-05-24 Thread Johan Svensson
This was indeed a bug and I just committed a fix for it in trunk.

-Johan

On Tue, May 18, 2010 at 6:28 PM, Garrett Smith g...@rre.tt wrote:
 See attached.

 To reproduce, run against r4415:

 $ java -cp PATH NodeNotInUse DBDIR create
 $ java -cp PATH NodeNotInUse DBDIR delete NODEID

 I modified 
 src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java
 to get the error info:

 --- src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java   
   (revision
 4415)
 +++ src/main/java/org/neo4j/kernel/impl/core/TransactionEventsSyncHook.java   
   (working
 copy)
 @@ -35,6 +35,7 @@

     public void beforeCompletion()
     {
 +        try {
         this.transactionData = nodeManager.getTransactionData();
         states = new ArrayListHandlerAndState();
         for ( TransactionEventHandlerT handler : this.handlers )
 @@ -55,6 +56,10 @@
                 throw new RuntimeException( t );
             }
         }
 +        } catch (Throwable th) {
 +            th.printStackTrace();
 +            throw new RuntimeException(th);
 +        }
     }

 On Tue, May 18, 2010 at 6:38 AM, Johan Svensson jo...@neotechnology.com 
 wrote:
 Garrett,

 This could be a bug. Could you please provide a test case that trigger
 this behavior.

 -Johan

 On Sat, May 15, 2010 at 8:46 PM, Tobias Ivarsson
 tobias.ivars...@neotechnology.com wrote:
 Create a ticket for it, I've tagged it for reviewing when I get back to the
 office, you had the great unfortune to send this right at the beginning of a
 4 day Swedish holiday.

 If you could supply code that can reproduce it that would be even better.

 Cheers,
 Tobias

 On Sat, May 15, 2010 at 8:42 PM, Garrett Smith g...@rre.tt wrote:

 Is this something I should open a ticket for, or is it something the
 dev team is aware of? Or is it user error?

 Garrett


 -- Forwarded message --
 From: Garrett Smith g...@rre.tt
 Date: Thu, May 13, 2010 at 2:52 PM
 Subject: Node not in use exception when using tx event handler
 To: Neo4j Users user@lists.neo4j.org


 I'm running into the exception below when I try to delete a node when
 first starting up a graph database.

 I'm experimenting with a transaction event handler. The error,
 however, occurs before my handler gets called.

 org.neo4j.kernel.impl.nioneo.store.InvalidRecordException: Node[10] not in
 use
        at
 org.neo4j.kernel.impl.nioneo.xa.WriteTransaction.nodeGetProperties(WriteTransaction.java:1009)
        at
 org.neo4j.kernel.impl.nioneo.xa.NeoStoreXaConnection$NodeEventConsumerImpl.getProperties(NeoStoreXaConnection.java:228)
        at
 org.neo4j.kernel.impl.nioneo.xa.NioNeoDbPersistenceSource$NioNeoDbResourceConnection.nodeLoadProperties(NioNeoDbPersistenceSource.java:432)
        at
 org.neo4j.kernel.impl.persistence.PersistenceManager.loadNodeProperties(PersistenceManager.java:100)
        at
 org.neo4j.kernel.impl.core.NodeManager.loadProperties(NodeManager.java:628)
        at
 org.neo4j.kernel.impl.core.NodeImpl.loadProperties(NodeImpl.java:84)
        at
 org.neo4j.kernel.impl.core.Primitive.ensureFullLightProperties(Primitive.java:591)
        at
 org.neo4j.kernel.impl.core.Primitive.getAllCommittedProperties(Primitive.java:604)
        at
 org.neo4j.kernel.impl.core.LockReleaser.populateNodeRelEvent(LockReleaser.java:855)
        at
 org.neo4j.kernel.impl.core.LockReleaser.getTransactionData(LockReleaser.java:740)
        at
 org.neo4j.kernel.impl.core.NodeManager.getTransactionData(NodeManager.java:914)
        at
 org.neo4j.kernel.impl.core.TransactionEventsSyncHook.beforeCompletion(TransactionEventsSyncHook.java:39)
        at
 org.neo4j.kernel.impl.transaction.TransactionImpl.doBeforeCompletion(TransactionImpl.java:341)
        at
 org.neo4j.kernel.impl.transaction.TxManager.commit(TxManager.java:556)
        at
 org.neo4j.kernel.impl.transaction.TransactionImpl.commit(TransactionImpl.java:103)
        at
 org.neo4j.kernel.EmbeddedGraphDbImpl$TransactionImpl.finish(EmbeddedGraphDbImpl.java:410)
        at gv.graph.Nodes.deleteNode(Nodes.java:349)
        at gv.graph.NodeDelete.handle(NodeDelete.java:20)
        at gv.graph.MessageHandler.run(MessageHandler.java:59)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.runTask(ThreadPoolExecutor.java:886)
        at
 java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:908)
        at java.lang.Thread.run(Thread.java:619)
 May 13, 2010 2:42:56 PM
 org.neo4j.kernel.impl.transaction.TransactionImpl doBeforeCompletion
 WARNING: Caught exception from tx
 syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6
 ]
 beforeCompletion()
 May 13, 2010 2:42:56 PM
 org.neo4j.kernel.impl.transaction.TransactionImpl doAfterCompletion
 WARNING: Caught exception from tx
 syncronization[org.neo4j.kernel.impl.core.transactioneventssynch...@edf3f6
 ]
 afterCompletion()

 Code details:

 URL: https://svn.neo4j.org/components/kernel/trunk
 Repository Root: https://svn.neo4j.org
 Repository 

[Neo] Efficient way to sparsify a graph?

2010-05-24 Thread Alex Averbuch
Hey,
I have a large (by my standards) graph and I would like to reduce it's size
so it all fits in memory.
This is that same Twitter graph as I mentioned earlier: 2.5million Nodes
250million Relationships.

The goal is for the graph to still have the same topology and
characteristics after it has been made more sparse.
My plan to do this was to uniformly randomly select Relationships for
deletion, until the graph is small enough.

My first approach is basically this:

until (graph_is_small_enough)
  random_relationship = get_relationship_by_id(random_number)
  random_relationship.delete()

I'm using the transactional GraphDatabaseService at the moment, rather than
the BatchInserter... mostly because I'm not inserting anything and I assumed
the optimizations made to the BatchInserter were only for write operations.

The reason I want to delete Relationships instead of Nodes is
  (1) I don't want to accidentally delete any super nodes, as these are
what gives Twitter it's unique structure
  (2) The number of Nodes is not the main problem that's keeping me from
being able to store the graph in RAM

The problem with the current approach is that it feels like I'm working
against Neo4j's strengths and it is very very slow... I waited over an hour
and less than 1,000,000 Relationships had been deleted. Given that my aim is
to half the number of Relationships, it would take me over 100hours (1 week)
to complete this process. In the worst case this is what I'll resort to, but
I'd rather not if there's a better way.

My questions are:
(1) Can you think of an alternative, faster and still meaningful (maintain
graph structure) way to reduce this graph size?
(2) Using the same method I'm using now, are there some magical
optimizations that will greatly improve performance?

Thanks,
Alex
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user