Re: [Neo4j] Memory overflow while creating big graph

2011-08-23 Thread Mattias Persson
Could you just quickly look at where most time is spent when it's slowing
down? Just start VisualVM, attach to the process and monitor CPU

2011/8/16 Jose Vinicius Pimenta Coletto jvcole...@gmail.com

 Hi,

 I made some changes to use the BatchInserter to generate the initial
 database. The strategy is to identify all nodes that must be inserted and
 after
 doing this I create the edges.
 But I still having problems, after inserting 9M of nodes the running is
 very
 slow and not reach the edges insertion.
 As already said the graph have 14M node and 11M edges.

 I am running the JVM as follows: 'java-Xmx4g-XX:-jar-UseGCOverheadLimit
 qsa.jar params'.

 Information on the machine I'm using: 'Linux 2.6.38-10-46-generic # Ubuntu
 SMP Tue Jun 28 15:07:17 UTC 2011x86_64 x86_64 x86_64 GNU / Linux' with 4GB
 of RAM.

 The code I'm using to create the initial database is attached, the method
 that should be looked at is: createDB.

 --
 Thanks,
 Jose Vinicius Pimenta Coletto

 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Hacker, Neo Technology
www.neotechnology.com
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Memory overflow while creating big graph

2011-08-16 Thread Anders Nawroth
Hi Jose!

The mailinglist removed your attachement, could you just paste the code 
into the mail instead?

/anders

2011-08-16 22:55, Jose Vinicius Pimenta Coletto skrev:
 Hi,

 I made some changes to use the BatchInserter to generate the initial
 database. The strategy is to identify all nodes that must be inserted and 
 after
 doing this I create the edges.
 But I still having problems, after inserting 9M of nodes the running is very
 slow and not reach the edges insertion.
 As already said the graph have 14M node and 11M edges.

 I am running the JVM as follows: 'java-Xmx4g-XX:-jar-UseGCOverheadLimit
 qsa.jarparams'.

 Information on the machine I'm using: 'Linux 2.6.38-10-46-generic # Ubuntu
 SMP Tue Jun 28 15:07:17 UTC 2011x86_64 x86_64 x86_64 GNU / Linux' with 4GB
 of RAM.

 The code I'm using to create the initial database is attached, the method
 that should be looked at is: createDB.




 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Memory overflow while creating big graph

2011-08-16 Thread Peter Neubauer
Joe,
Do you have access to a profiler like Visual VM? It could be that the Regexp
is not scaling - i have seen this in my SQL importer project. Just a
thought, would be great if you can measure where the slowdown occurs.

/peter

Sent from my phone.
On Aug 16, 2011 11:09 PM, Jose Vinicius Pimenta Coletto 
jvcole...@gmail.com wrote:
 Sorry, the source code follows:

 public class InitialDBCreator {
 private static final SimpleDateFormat DATE_PARSER = new
 SimpleDateFormat(dd/MM/);
 private static final SimpleDateFormat DATE_FORMATTER = new
 SimpleDateFormat(MMdd);

 private static final int GRP_DEST_DOC = 1;
 private static final int GRP_DEST_NAME = 2;
 private static final int GRP_SRC_DOC = 3;
 private static final int GRP_SRC_NAME = 5;
 private static final int GRP_QUAL = 6;
 private static final int GRP_ENTRY_DATE = 7;
 private static final int GRP_PART_INT = 8;
 private static final int GRP_PART_DEC = 9;
 private static final Pattern PTRN_LINE =
 Pattern.compile((\\d{11,14})\\t([^\\t]+)\\t(\\d{11,14})\\t+

 ([^\\t]+)\\t([^\\t]+)\\t([^\\t]+)\\t(\\d{2}/\\d{2}/+

 \\d{4})\\t(\\d{1,3}),(\\d{2})%\\t(\\d{2}/\\d{2}/\\d{4}));

 private final BatchInserter inserter;
 private final GraphDatabaseService dbService;
 private final BatchInserterIndexProvider indexProvider;
 private final BatchInserterIndex index;

 public InitialDBCreator(final String storeDir, final MapString, String
 config, final String indexName) {
 System.out.println(Iniciando inserter...);
 inserter = new BatchInserterImpl(storeDir, config);
 dbService = inserter.getGraphDbService();
 System.out.println(Iniciando indexProvider...);
 indexProvider = new LuceneBatchInserterIndexProvider(inserter);
 System.out.println(Iniciando index...);
 index = indexProvider.nodeIndex(indexName, MapUtil.stringMap(type,
 exact));
 System.out.println(DB iniciado!);
 Runtime.getRuntime().addShutdownHook(
 new Thread() {
 @Override
 public void run() {
 indexProvider.shutdown();
 inserter.shutdown();
 }
 });
 }

 public void shutdown() {
 index.flush();
 indexProvider.shutdown();
 inserter.shutdown();
 }

 private File prepareNodesFile(final File initialFile) {
 File nodesFile = null;
 int count;
 int countErr;

 try {
 System.out.println(Extracting nodes...);
 File tmpFile = File.createTempFile(qsa-tempnodes, .txt);
 BufferedWriter writer = new BufferedWriter(new FileWriter(tmpFile));
 InputStream in = FUtils.getInputStream(initialFile);
 BufferedReader reader = new BufferedReader(new InputStreamReader(in));
 String line = null;
 count = 0;
 countErr = 0;
 while ((line = reader.readLine()) != null) {
 Matcher matcher = PTRN_LINE.matcher(line);
 if (matcher.matches()) {
 String docOne = matcher.group(GRP_SRC_DOC);
 String nameOne = matcher.group(GRP_SRC_NAME);
 if (!docOne.equals()  !nameOne.equals()) {
 writer.write(docOne+|+nameOne+\n);
 }

 String docTwo = matcher.group(GRP_DEST_DOC);
 String nameTwo = matcher.group(GRP_DEST_NAME);
 if (!docTwo.equals()  !nameTwo.equals()) {
 writer.write(docTwo+|+nameTwo+\n);
 }
 count++;
 } else {
 System.err.println(ERRO: the line '+line+' doesn't match the
pattern.);
 System.err.println(---);
 countErr++;
 }

 if (((count  0)  (count % 5000 == 0)) || ((countErr  0)  (countErr %
 500 == 0))) {
 System.out.print(\r+count+ rows processed, +countErr+ erroneous
 lines.);
 }
 }
 System.out.println(\r+count+ rows processed, +countErr+ erroneous
 lines.);
 in.close();
 reader.close();
 writer.close();

 File sortedFile = FUtils.sortFile(tmpFile);

 System.out.println(Unifying nodes...);
 nodesFile = File.createTempFile(qsa-nodes, .txt);
 writer = new BufferedWriter(new FileWriter(nodesFile));
 in = FUtils.getInputStream(sortedFile);
 reader = new BufferedReader(new InputStreamReader(in));
 line = null;
 count = 0;
 String lastDoc = -1;
 String lastLine = ;
 while ((line = reader.readLine()) != null) {
 String doc = line.substring(0, line.indexOf(|));
 if (!doc.equals(lastDoc)  !lastDoc.equals(-1)) {
 writer.write(lastLine+\n);
 }
 lastDoc = doc;
 lastLine = line;
 count++;
 if ((count  0)  (count % 5000 == 0)) {
 System.out.print(\r+count+ rows processed.);
 }
 }
 writer.write(lastLine+\n);
 System.out.println(\r+count+ rows processed.);
 in.close();
 reader.close();
 writer.close();
 } catch (IOException e) {
 e.printStackTrace();
 }

 return nodesFile;
 }

 private void addPerson(final String doc, final String name) {
 PersonType tipo = (doc.length() = 11) ? PersonType.INDIVIDUAL :
 PersonType.LEGAL;

 MapString, Object pessoaProperties = new HashMapString, Object();
 pessoaProperties.put(Person.KEY_DOC , doc);
 pessoaProperties.put(Person.KEY_NAME, name);
 pessoaProperties.put(Person.KEY_TYPE, tipo.toString());

 MapString, Object indexInfo = new HashMapString, Object();
 indexInfo.put(Person.KEY_DOC, doc);

 index.add(inserter.createNode(pessoaProperties), indexInfo);
 tipo = null;
 pessoaProperties = null;
 indexInfo = null;
 }

 private void addSociety(final String srcDoc, final String destDoc, final
 long entryDate,
 

Re: [Neo4j] Memory overflow while creating big graph

2011-08-03 Thread Niels Hoogeveen

Is it possible for you to use the batch inserter, or does the data you are 
loading require a lot of lookups?
Niels

 From: jvcole...@gmail.com
 Date: Wed, 3 Aug 2011 17:57:20 -0300
 To: user@lists.neo4j.org
 Subject: [Neo4j] Memory overflow while creating big graph
 
 Hi,
 
 I'm trying to create a graph with 15M nodes and 12M relationships, but after
 insert 400K relationships the following exception is thrown: Exception in
 thread main java.lang.OutOfMemoryError: GC overhead limit exceeded.
 
 I'm using -Xmx3g and the following configuration file for the graph:
 neostore.nodestore.db.mapped_memory = 256M
 neostore.relationshipstore.db.mapped_memory = 1G
 neostore.propertystore.db.mapped_memory = 90M
 neostore.propertystore.db.index.mapped_memory = 1M
 neostore.propertystore.db.index.keys.mapped_memory = 1M
 neostore.propertystore.db.strings.mapped_memory = 768M
 neostore.propertystore.db.arrays.mapped_memory = 130M
 cache_type = weak
 
 Can anyone help me?
 
 -- 
 Jose Vinicius Pimenta Coletto
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user
  
___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Memory overflow while creating big graph

2011-08-03 Thread Michael Hunger
Do you commit your transaction in batches (e.g. every 10k nodes)?

How much memory does your JVM get?
e.g. via.

-Xmx2G

Cheers

Michael

Am 03.08.2011 um 22:57 schrieb Jose Vinicius Pimenta Coletto:

 Hi,
 
 I'm trying to create a graph with 15M nodes and 12M relationships, but after
 insert 400K relationships the following exception is thrown: Exception in
 thread main java.lang.OutOfMemoryError: GC overhead limit exceeded.
 
 I'm using -Xmx3g and the following configuration file for the graph:
 neostore.nodestore.db.mapped_memory = 256M
 neostore.relationshipstore.db.mapped_memory = 1G
 neostore.propertystore.db.mapped_memory = 90M
 neostore.propertystore.db.index.mapped_memory = 1M
 neostore.propertystore.db.index.keys.mapped_memory = 1M
 neostore.propertystore.db.strings.mapped_memory = 768M
 neostore.propertystore.db.arrays.mapped_memory = 130M
 cache_type = weak
 
 Can anyone help me?
 
 -- 
 Jose Vinicius Pimenta Coletto
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo4j] Memory overflow while creating big graph

2011-08-03 Thread Michael Hunger
Jose,

can you provide the full stack trace of the OOM ?

and probably show some of the source code you use to try to reproduce it.

How much physical RAM does the machine have?

Can you show us the configuration dump at the last startup of 
graph.db/messages.log ?

Cheers

Michael

for the batch inserter there is the BatchInserterIndexProvider
Please see here: http://docs.neo4j.org/chunked/1.4/indexing-batchinsert.html

Am 03.08.2011 um 23:51 schrieb Jose Vinicius Pimenta Coletto:

 Niels, before creating any node or relationship I check if some of them
 already exist in the index, I can use the BatchInserter doing this?
 
 Michael, I'm closing my Transactions every 5k inserts, but apparently this
 is not working. I'm running the JVM with -Xmx3g.
 
 -- 
 Thanks,
 Jose Vinicius Pimenta Coletto
 ___
 Neo4j mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

___
Neo4j mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user