Re: [Neo4j] Memory overflow while creating big graph
Could you just quickly look at where most time is spent when it's slowing down? Just start VisualVM, attach to the process and monitor CPU 2011/8/16 Jose Vinicius Pimenta Coletto jvcole...@gmail.com Hi, I made some changes to use the BatchInserter to generate the initial database. The strategy is to identify all nodes that must be inserted and after doing this I create the edges. But I still having problems, after inserting 9M of nodes the running is very slow and not reach the edges insertion. As already said the graph have 14M node and 11M edges. I am running the JVM as follows: 'java-Xmx4g-XX:-jar-UseGCOverheadLimit qsa.jar params'. Information on the machine I'm using: 'Linux 2.6.38-10-46-generic # Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011x86_64 x86_64 x86_64 GNU / Linux' with 4GB of RAM. The code I'm using to create the initial database is attached, the method that should be looked at is: createDB. -- Thanks, Jose Vinicius Pimenta Coletto ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Hacker, Neo Technology www.neotechnology.com ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Memory overflow while creating big graph
Hi Jose! The mailinglist removed your attachement, could you just paste the code into the mail instead? /anders 2011-08-16 22:55, Jose Vinicius Pimenta Coletto skrev: Hi, I made some changes to use the BatchInserter to generate the initial database. The strategy is to identify all nodes that must be inserted and after doing this I create the edges. But I still having problems, after inserting 9M of nodes the running is very slow and not reach the edges insertion. As already said the graph have 14M node and 11M edges. I am running the JVM as follows: 'java-Xmx4g-XX:-jar-UseGCOverheadLimit qsa.jarparams'. Information on the machine I'm using: 'Linux 2.6.38-10-46-generic # Ubuntu SMP Tue Jun 28 15:07:17 UTC 2011x86_64 x86_64 x86_64 GNU / Linux' with 4GB of RAM. The code I'm using to create the initial database is attached, the method that should be looked at is: createDB. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Memory overflow while creating big graph
Joe, Do you have access to a profiler like Visual VM? It could be that the Regexp is not scaling - i have seen this in my SQL importer project. Just a thought, would be great if you can measure where the slowdown occurs. /peter Sent from my phone. On Aug 16, 2011 11:09 PM, Jose Vinicius Pimenta Coletto jvcole...@gmail.com wrote: Sorry, the source code follows: public class InitialDBCreator { private static final SimpleDateFormat DATE_PARSER = new SimpleDateFormat(dd/MM/); private static final SimpleDateFormat DATE_FORMATTER = new SimpleDateFormat(MMdd); private static final int GRP_DEST_DOC = 1; private static final int GRP_DEST_NAME = 2; private static final int GRP_SRC_DOC = 3; private static final int GRP_SRC_NAME = 5; private static final int GRP_QUAL = 6; private static final int GRP_ENTRY_DATE = 7; private static final int GRP_PART_INT = 8; private static final int GRP_PART_DEC = 9; private static final Pattern PTRN_LINE = Pattern.compile((\\d{11,14})\\t([^\\t]+)\\t(\\d{11,14})\\t+ ([^\\t]+)\\t([^\\t]+)\\t([^\\t]+)\\t(\\d{2}/\\d{2}/+ \\d{4})\\t(\\d{1,3}),(\\d{2})%\\t(\\d{2}/\\d{2}/\\d{4})); private final BatchInserter inserter; private final GraphDatabaseService dbService; private final BatchInserterIndexProvider indexProvider; private final BatchInserterIndex index; public InitialDBCreator(final String storeDir, final MapString, String config, final String indexName) { System.out.println(Iniciando inserter...); inserter = new BatchInserterImpl(storeDir, config); dbService = inserter.getGraphDbService(); System.out.println(Iniciando indexProvider...); indexProvider = new LuceneBatchInserterIndexProvider(inserter); System.out.println(Iniciando index...); index = indexProvider.nodeIndex(indexName, MapUtil.stringMap(type, exact)); System.out.println(DB iniciado!); Runtime.getRuntime().addShutdownHook( new Thread() { @Override public void run() { indexProvider.shutdown(); inserter.shutdown(); } }); } public void shutdown() { index.flush(); indexProvider.shutdown(); inserter.shutdown(); } private File prepareNodesFile(final File initialFile) { File nodesFile = null; int count; int countErr; try { System.out.println(Extracting nodes...); File tmpFile = File.createTempFile(qsa-tempnodes, .txt); BufferedWriter writer = new BufferedWriter(new FileWriter(tmpFile)); InputStream in = FUtils.getInputStream(initialFile); BufferedReader reader = new BufferedReader(new InputStreamReader(in)); String line = null; count = 0; countErr = 0; while ((line = reader.readLine()) != null) { Matcher matcher = PTRN_LINE.matcher(line); if (matcher.matches()) { String docOne = matcher.group(GRP_SRC_DOC); String nameOne = matcher.group(GRP_SRC_NAME); if (!docOne.equals() !nameOne.equals()) { writer.write(docOne+|+nameOne+\n); } String docTwo = matcher.group(GRP_DEST_DOC); String nameTwo = matcher.group(GRP_DEST_NAME); if (!docTwo.equals() !nameTwo.equals()) { writer.write(docTwo+|+nameTwo+\n); } count++; } else { System.err.println(ERRO: the line '+line+' doesn't match the pattern.); System.err.println(---); countErr++; } if (((count 0) (count % 5000 == 0)) || ((countErr 0) (countErr % 500 == 0))) { System.out.print(\r+count+ rows processed, +countErr+ erroneous lines.); } } System.out.println(\r+count+ rows processed, +countErr+ erroneous lines.); in.close(); reader.close(); writer.close(); File sortedFile = FUtils.sortFile(tmpFile); System.out.println(Unifying nodes...); nodesFile = File.createTempFile(qsa-nodes, .txt); writer = new BufferedWriter(new FileWriter(nodesFile)); in = FUtils.getInputStream(sortedFile); reader = new BufferedReader(new InputStreamReader(in)); line = null; count = 0; String lastDoc = -1; String lastLine = ; while ((line = reader.readLine()) != null) { String doc = line.substring(0, line.indexOf(|)); if (!doc.equals(lastDoc) !lastDoc.equals(-1)) { writer.write(lastLine+\n); } lastDoc = doc; lastLine = line; count++; if ((count 0) (count % 5000 == 0)) { System.out.print(\r+count+ rows processed.); } } writer.write(lastLine+\n); System.out.println(\r+count+ rows processed.); in.close(); reader.close(); writer.close(); } catch (IOException e) { e.printStackTrace(); } return nodesFile; } private void addPerson(final String doc, final String name) { PersonType tipo = (doc.length() = 11) ? PersonType.INDIVIDUAL : PersonType.LEGAL; MapString, Object pessoaProperties = new HashMapString, Object(); pessoaProperties.put(Person.KEY_DOC , doc); pessoaProperties.put(Person.KEY_NAME, name); pessoaProperties.put(Person.KEY_TYPE, tipo.toString()); MapString, Object indexInfo = new HashMapString, Object(); indexInfo.put(Person.KEY_DOC, doc); index.add(inserter.createNode(pessoaProperties), indexInfo); tipo = null; pessoaProperties = null; indexInfo = null; } private void addSociety(final String srcDoc, final String destDoc, final long entryDate,
Re: [Neo4j] Memory overflow while creating big graph
Is it possible for you to use the batch inserter, or does the data you are loading require a lot of lookups? Niels From: jvcole...@gmail.com Date: Wed, 3 Aug 2011 17:57:20 -0300 To: user@lists.neo4j.org Subject: [Neo4j] Memory overflow while creating big graph Hi, I'm trying to create a graph with 15M nodes and 12M relationships, but after insert 400K relationships the following exception is thrown: Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded. I'm using -Xmx3g and the following configuration file for the graph: neostore.nodestore.db.mapped_memory = 256M neostore.relationshipstore.db.mapped_memory = 1G neostore.propertystore.db.mapped_memory = 90M neostore.propertystore.db.index.mapped_memory = 1M neostore.propertystore.db.index.keys.mapped_memory = 1M neostore.propertystore.db.strings.mapped_memory = 768M neostore.propertystore.db.arrays.mapped_memory = 130M cache_type = weak Can anyone help me? -- Jose Vinicius Pimenta Coletto ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Memory overflow while creating big graph
Do you commit your transaction in batches (e.g. every 10k nodes)? How much memory does your JVM get? e.g. via. -Xmx2G Cheers Michael Am 03.08.2011 um 22:57 schrieb Jose Vinicius Pimenta Coletto: Hi, I'm trying to create a graph with 15M nodes and 12M relationships, but after insert 400K relationships the following exception is thrown: Exception in thread main java.lang.OutOfMemoryError: GC overhead limit exceeded. I'm using -Xmx3g and the following configuration file for the graph: neostore.nodestore.db.mapped_memory = 256M neostore.relationshipstore.db.mapped_memory = 1G neostore.propertystore.db.mapped_memory = 90M neostore.propertystore.db.index.mapped_memory = 1M neostore.propertystore.db.index.keys.mapped_memory = 1M neostore.propertystore.db.strings.mapped_memory = 768M neostore.propertystore.db.arrays.mapped_memory = 130M cache_type = weak Can anyone help me? -- Jose Vinicius Pimenta Coletto ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Memory overflow while creating big graph
Jose, can you provide the full stack trace of the OOM ? and probably show some of the source code you use to try to reproduce it. How much physical RAM does the machine have? Can you show us the configuration dump at the last startup of graph.db/messages.log ? Cheers Michael for the batch inserter there is the BatchInserterIndexProvider Please see here: http://docs.neo4j.org/chunked/1.4/indexing-batchinsert.html Am 03.08.2011 um 23:51 schrieb Jose Vinicius Pimenta Coletto: Niels, before creating any node or relationship I check if some of them already exist in the index, I can use the BatchInserter doing this? Michael, I'm closing my Transactions every 5k inserts, but apparently this is not working. I'm running the JVM with -Xmx3g. -- Thanks, Jose Vinicius Pimenta Coletto ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user