Re: [Neo4j] [Neo] memory mapping

2010-05-27 Thread Lorenzo Livi
Hi


 Is the machine busy with other tasks also? Disk read access should not
 be a problem since you have so much RAM. You can test the read speed
 by doing a sequential read on for example the relationship store file:

 $ dd if=neostore.relationshipstore.db of=/dev/null bs=10M

 With 16GB RAM all and the memory mapped config you have all the store
 files should be cached by the OS and reads will be very fast.

 If you are storing a lot of the traversal results in memory this could
 be a GC issue. Could you run with -verbose:gc and check if there are
 any long GC pause times.


The problem is that now I can't use more then 2Gb (heap + direct
access) for each run ... It's not enough.
I'll take a look at the gc behaviour.


 Yes this warning can safely be ignored. The problem is that if the
 assigned memory does not chunk up in perfect fit for the store file
 the last memory mapped window placed at the end of the file will be
 larger than the actual file. When running in read only mode the files
 will be opened in read only mode and then the file actually needs to
 be expanded some but that is not possible in read only mode. To avoid
 it you could write exact number of bytes of each store file for memory
 mapped configuration.

 Regards,
 Johan


Ok, thank you very much.

Best regards,
Lorenzo
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Efficient way to sparsify a graph?

2010-05-27 Thread Alex Averbuch
Hi Peter,
Yeah it's under control now.

Cheers,
Alex

On Thu, May 27, 2010 at 5:36 PM, Peter Neubauer 
peter.neuba...@neotechnology.com wrote:

 Alex,
 not an expert on delete performance but that looks ok to me, and is
 workable for you now?

 Cheers,

 /peter neubauer

 COO and Sales, Neo Technology

 GTalk:  neubauer.peter
 Skype   peter.neubauer
 Phone   +46 704 106975
 LinkedIn   http://www.linkedin.com/in/neubauer
 Twitter  http://twitter.com/peterneubauer

 http://www.neo4j.org   - Your high performance graph database.
 http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party.



 On Tue, May 25, 2010 at 9:51 AM, Alex Averbuch alex.averb...@gmail.com
 wrote:
  OK, seems to be running much faster now.
  The reason for such slow performance was mine, not Neo4j's.
 
  Before the input parameters to my function allowed the user to specify
  exactly how many Relationships should be removed. This meant in order to
 be
  fair I had to uniformly randomly generate ID's in the key space (also
  defined by input parameter). The problem with this approach is as time
  passes it becomes more likely to select an ID that has already been
 removed.
  I had checks to avoid errors as a result, but it still meant many wasted
  disk reads.
 
  //BEFORE
  until (graph_is_small_enough)
   random_number = generate_uniform(0, maxId)
   random_relationship = get_relationship_by_id(random_number)
   random_relationship.delete()
 
  Now the input parameters only let the user specify the PERCENTAGE of all
  Relationships that should be kept.
  This means I don't need to keep an state that tells me which ID's have
  already been deleted, and I can iterate through all Relationships and
 never
  have a missed read (assuming there are no wholes in the key space).
 
  //NOW
  for (index=0 to maxId)
   random_number = generate_uniform(0, maxId)
   if (random_number  percent_to_keep)
 continue;
   random_relationship = get_relationship_by_id(index)
   random_relationship.delete()
 
  Performance is much better now. The first 1,000,000 deletions took
 ~4minutes
 
  Cheers,
  Alex
 
  On Mon, May 24, 2010 at 11:24 PM, Alex Averbuch alex.averb...@gmail.com
 wrote:
 
  Hey,
  I have a large (by my standards) graph and I would like to reduce it's
 size
  so it all fits in memory.
  This is that same Twitter graph as I mentioned earlier: 2.5million Nodes
  250million Relationships.
 
  The goal is for the graph to still have the same topology and
  characteristics after it has been made more sparse.
  My plan to do this was to uniformly randomly select Relationships for
  deletion, until the graph is small enough.
 
  My first approach is basically this:
 
  until (graph_is_small_enough)
random_relationship = get_relationship_by_id(random_number)
random_relationship.delete()
 
  I'm using the transactional GraphDatabaseService at the moment, rather
 than
  the BatchInserter... mostly because I'm not inserting anything and I
 assumed
  the optimizations made to the BatchInserter were only for write
 operations.
 
  The reason I want to delete Relationships instead of Nodes is
(1) I don't want to accidentally delete any super nodes, as these
 are
  what gives Twitter it's unique structure
(2) The number of Nodes is not the main problem that's keeping me from
  being able to store the graph in RAM
 
  The problem with the current approach is that it feels like I'm working
  against Neo4j's strengths and it is very very slow... I waited over an
 hour
  and less than 1,000,000 Relationships had been deleted. Given that my
 aim is
  to half the number of Relationships, it would take me over 100hours (1
 week)
  to complete this process. In the worst case this is what I'll resort to,
 but
  I'd rather not if there's a better way.
 
  My questions are:
  (1) Can you think of an alternative, faster and still meaningful
 (maintain
  graph structure) way to reduce this graph size?
  (2) Using the same method I'm using now, are there some magical
  optimizations that will greatly improve performance?
 
  Thanks,
  Alex
 
  ___
  Neo4j mailing list
  User@lists.neo4j.org
  https://lists.neo4j.org/mailman/listinfo/user
 
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user