Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
On Thu, Sep 22, 2011 at 2:15 PM, st3ven wrote: > > Hi Johan, > > I changed the settings as you described, but that changed the speed not > really significantly. The previous configuration would make the machine use swap and that will kill performance. > > To store the degree as a property on each node is an option, but I want the > node degree to be calculated from the graph database as I also want to check The problem is that you are trying to access a 85GB+ dataset using only 16GB RAM. The recommendation then is to aggregate the information (store the degree count as a property). Peter also mentioned using HA (cache sharding) but if you can just get some more RAM into the machine you will see an improvement. SSD disk would also help here since you are touching all edges in the graph while a mechanical disk (in this setup) will have horrible performance ( low throughput with 99% load on disk). There are SSD solutions that handle terabytes of data today and they are dropping in price. Regards, Johan ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi Johan, I changed the settings as you described, but that changed the speed not really significantly. To store the degree as a property on each node is an option, but I want the node degree to be calculated from the graph database as I also want to check some other metrics on the entire graph. Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3358544.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi Linan, that would just fit for that scenario and I wouldn't use the graph database to get the node degree. In that scenario I could also use my relationship file to calculate the node degree, but I also want to check some other metrics on the graph database like cluster coefficient and so on. Than I would again ran into the same problem. I need something to walk very fast through the entire graph database. Maybe you have another tip for me ;-). Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3358539.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi Stephan, You could try lower the heap size to -Xmx2G and cache_type=weak with 10G memory mapped for relationships. The machine only has 16G RAM and will not be able to process such a large dataset at in-memory speeds. Another option is to calculate degree at insertion time and store it as a property on each node. Regards, Johan On Wed, Sep 21, 2011 at 12:44 PM, st3ven wrote: > Hi Linan, > > I just tried it with the outgoing relationships, but unfortunately that > didn't speed things up. > > The size of my db is around 140GB and so it is not possible for me to dumb > the full directory into a ramfs. > My files on the hard disk have the following size: > neostore.nodestore.db = 31MB > neostore.relationshipstore.db = 85GB > neostore.propertystore.db = 65GB > neostore.propertystore.db.strings = 180MB > Is there maybe a chance of reducing the size of my database? > > Cheers, > Stephan > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi stephan, I mis-calculated the size of relationshipstore.db. i thought it was around 8G instead of 85G. the only option left i think is to build index. something like this: idx = db.index().forNode("knows"); idx.add(thisguy, "knows", thatguy.getId()); idx.add(thatguy, "known_by", thisguy.getId()); the benefit is that when querying, the return size is pre-calculated so it would save some iteration time. the problem is the index files size, should around 85G. On Wed, Sep 21, 2011 at 11:44 AM, st3ven wrote: > Hi Linan, > > I just tried it with the outgoing relationships, but unfortunately that > didn't speed things up. > > The size of my db is around 140GB and so it is not possible for me to dumb > the full directory into a ramfs. > My files on the hard disk have the following size: > neostore.nodestore.db = 31MB > neostore.relationshipstore.db = 85GB > neostore.propertystore.db = 65GB > neostore.propertystore.db.strings = 180MB > Is there maybe a chance of reducing the size of my database? > > Cheers, > Stephan > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Best regards Linan Wang ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi Linan, I just tried it with the outgoing relationships, but unfortunately that didn't speed things up. The size of my db is around 140GB and so it is not possible for me to dumb the full directory into a ramfs. My files on the hard disk have the following size: neostore.nodestore.db = 31MB neostore.relationshipstore.db = 85GB neostore.propertystore.db = 65GB neostore.propertystore.db.strings = 180MB Is there maybe a chance of reducing the size of my database? Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355074.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Peter, I don't think that this would help me, because I wouldn't use the graph to get the node degree and to get the node degree I could also just use my file with all relationships, but I want to use the graph database to get that. The problem for me is that I don't just want to get the node degree, I also want to check some other metrics on the graph database like clustering coefficient and than I would also have the same problem that I can't read up the entire database. Creating an index for the node degree would just fit for that scenario, but than I can't go on. Maybe you have another tip for me ;-). Thanks for your help! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355067.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Unfortunately the SSD is not an option, because I would need a SSD with around 150GB as my database is 140GB big. Yesterday I already tried to configure Neo4j to use more memory for mapping, but it seems that Neo4j does't allocate the whole memory I configured. I noticed that my system just uses 4GB after Neo4j is running for a while, but I got 16GB to use. I downloaded the following configuration file http://dist.neo4j.org/neo_default.props http://dist.neo4j.org/neo_default.props and changed the entries like this: neostore.nodestore.db.mapped_memory=31M neostore.relationshipstore.db.mapped_memory=8G neostore.propertystore.db.mapped_memory=90M neostore.propertystore.db.index.mapped_memory=1M neostore.propertystore.db.index.keys.mapped_memory=1M neostore.propertystore.db.strings.mapped_memory=180M neostore.propertystore.db.arrays.mapped_memory=130M My files on the hard disk have the following size: neostore.nodestore.db = 31MB neostore.relationshipstore.db = 85GB neostore.propertystore.db = 65GB neostore.propertystore.db.strings = 180MB Shall I maybe also change something at the Cache Settings in that file? What settings would be good for me? As I already said I am using right now the following Java parameters: -server -Xmx8G -XX:+UseParallelGC -XX:+UseNUMA Is there also something I should change? Thanks for your help! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3355044.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Stephan, what's the size of your db? if it's under 10G, how about just dump the full directory into to a ramfs. leave 1G to jvm and it'll do heavy io on the ramfs. i think it's a simple solution and could yield interesting result. please let me know the result if you tried. thanks On Tue, Sep 20, 2011 at 5:41 PM, Peter Neubauer wrote: > Steven, > the index is built into the DB, so you can use something like > http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-index.html > to index all your nodes into Lucene (in one index, the node as key, > the number of relationships as numeric value when creating them). When > reading, you would simply request all keys from the index and iterate > over them. I am not terribly sure how much fast it is, but given that > you are just loading up documents, Lucene should be reasonably fast. > > Let us know if that works out! > > Cheers, > > /peter neubauer > > GTalk: neubauer.peter > Skype peter.neubauer > Phone +46 704 106975 > LinkedIn http://www.linkedin.com/in/neubauer > Twitter http://twitter.com/peterneubauer > > http://www.neo4j.org - Your high performance graph database. > http://startupbootcamp.org/ - Öresund - Innovation happens HERE. > http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. > > > > On Tue, Sep 20, 2011 at 6:01 PM, st3ven wrote: >> Hello Peter, >> >> it's a pity that neo4j doesn't support full graph-scans. >> >> Is there maybe a possibility to cache more relationships to speed things up >> a little bit. >> I recognized that only the iteration over the relationships is taking hours. >> The time to get all relationships of one node is quite fast. >> >> I think I could try your second solution: >> - Store the relationships as a property in an Index (e.g. Lucene) and >> as the index for all entries. Thus, you are using an index for what it >> is good at - global operations over all documents. >> >> But I didn't understood it correctly. Do you mean an Index which stores the >> ID of a relationship and creating such an Index for every node? >> Could you maybe give me a code example for that? >> That would be very kind of you. >> >> The first solution is not really realizable, because I don't know the number >> of relationships of every node. >> I would have to count the relationships before the insertion and that would >> make my database useless for the node degree query. >> >> Thank you very much for your help! >> >> Cheers, >> Stephan >> >> -- >> View this message in context: >> http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html >> Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. >> ___ >> Neo4j mailing list >> User@lists.neo4j.org >> https://lists.neo4j.org/mailman/listinfo/user >> > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Best regards Linan Wang ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Steven, the index is built into the DB, so you can use something like http://docs.neo4j.org/chunked/snapshot/tutorials-java-embedded-index.html to index all your nodes into Lucene (in one index, the node as key, the number of relationships as numeric value when creating them). When reading, you would simply request all keys from the index and iterate over them. I am not terribly sure how much fast it is, but given that you are just loading up documents, Lucene should be reasonably fast. Let us know if that works out! Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Sep 20, 2011 at 6:01 PM, st3ven wrote: > Hello Peter, > > it's a pity that neo4j doesn't support full graph-scans. > > Is there maybe a possibility to cache more relationships to speed things up > a little bit. > I recognized that only the iteration over the relationships is taking hours. > The time to get all relationships of one node is quite fast. > > I think I could try your second solution: > - Store the relationships as a property in an Index (e.g. Lucene) and > as the index for all entries. Thus, you are using an index for what it > is good at - global operations over all documents. > > But I didn't understood it correctly. Do you mean an Index which stores the > ID of a relationship and creating such an Index for every node? > Could you maybe give me a code example for that? > That would be very kind of you. > > The first solution is not really realizable, because I don't know the number > of relationships of every node. > I would have to count the relationships before the insertion and that would > make my database useless for the node degree query. > > Thank you very much for your help! > > Cheers, > Stephan > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hello Peter, it's a pity that neo4j doesn't support full graph-scans. Is there maybe a possibility to cache more relationships to speed things up a little bit. I recognized that only the iteration over the relationships is taking hours. The time to get all relationships of one node is quite fast. I think I could try your second solution: - Store the relationships as a property in an Index (e.g. Lucene) and as the index for all entries. Thus, you are using an index for what it is good at - global operations over all documents. But I didn't understood it correctly. Do you mean an Index which stores the ID of a relationship and creating such an Index for every node? Could you maybe give me a code example for that? That would be very kind of you. The first solution is not really realizable, because I don't know the number of relationships of every node. I would have to count the relationships before the insertion and that would make my database useless for the node degree query. Thank you very much for your help! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352509.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
hi stephan, my theory is that most of the time would spent on retrieving imcoming relationships. could you try again but this time only retrieve outgoing relationship? for (Node node : db.getAllNodes()) { if (node.getId() > 0) { long test = System.currentTimeMillis(); Iterable rels = node.getRelationships(knows, Direction.OUTGOING); System.out.print("Retrieval:" + (System.currentTimeMillis() - test)); test = System.currentTimeMillis(); int count = com.google.common.collect.Iterables.size(rels); System.out.print("ms; Counting:" + (System.currentTimeMillis() - test)); System.out.println("ms; number of edges:" + count); } } On Tue, Sep 20, 2011 at 4:37 PM, st3ven wrote: > Hello again, > > the bottle neck is at the iteration. > I did some tests with it to check whether the iteration or relationship > retrievel is to slow. > > My test results look like this: > > Retrieval:1ms; Counting:158ms; number of edges:116407 > Retrieval:0ms; Counting:2ms; number of edges:1804 > Retrieval:0ms; Counting:0ms; number of edges:22 > Retrieval:0ms; Counting:0ms; number of edges:31 > Retrieval:0ms; Counting:0ms; number of edges:39 > Retrieval:0ms; Counting:2ms; number of edges:1213 > Retrieval:0ms; Counting:0ms; number of edges:57 > Retrieval:0ms; Counting:36ms; number of edges:59420 > Retrieval:0ms; Counting:335ms; number of edges:175156 > Retrieval:1ms; Counting:168ms; number of edges:146697 > Retrieval:0ms; Counting:354ms; number of edges:285051 > Retrieval:0ms; Counting:0ms; number of edges:50 > Retrieval:0ms; Counting:11ms; number of edges:20960 > Retrieval:0ms; Counting:0ms; number of edges:43 > Retrieval:0ms; Counting:0ms; number of edges:51 > Retrieval:0ms; Counting:1ms; number of edges:647 > Retrieval:0ms; Counting:5ms; number of edges:10216 > Retrieval:0ms; Counting:2ms; number of edges:3444 > Retrieval:0ms; Counting:0ms; number of edges:1128 > Retrieval:1ms; Counting:312ms; number of edges:319127 > Retrieval:1ms; Counting:0ms; number of edges:5 > Retrieval:0ms; Counting:760ms; number of edges:104741 > Retrieval:0ms; Counting:11ms; number of edges:9210 > Retrieval:0ms; Counting:0ms; number of edges:31 > Retrieval:1ms; Counting:3ms; number of edges:3116 > Retrieval:0ms; Counting:37ms; number of edges:70835 > Retrieval:0ms; Counting:383ms; number of edges:296445 > Retrieval:1ms; Counting:0ms; number of edges:120 > Retrieval:0ms; Counting:2ms; number of edges:1526 > Retrieval:0ms; Counting:0ms; number of edges:71 > Retrieval:0ms; Counting:42ms; number of edges:35960 > Retrieval:0ms; Counting:90ms; number of edges:9644 > Retrieval:0ms; Counting:186ms; number of edges:129981 > Retrieval:0ms; Counting:1ms; number of edges:1213 > Retrieval:1ms; Counting:143ms; number of edges:124495 > Retrieval:0ms; Counting:0ms; number of edges:58 > Retrieval:0ms; Counting:75ms; number of edges:56195 > Retrieval:0ms; Counting:99ms; number of edges:92574 > Retrieval:0ms; Counting:0ms; number of edges:13 > Retrieval:0ms; Counting:50ms; number of edges:26350 > Retrieval:0ms; Counting:2ms; number of edges:1856 > Retrieval:1ms; Counting:376ms; number of edges:114166 > Retrieval:0ms; Counting:9528ms; number of edges:11956 > Retrieval:0ms; Counting:50047ms; number of edges:12645 > Retrieval:1ms; Counting:43687ms; number of edges:15025 > > The first results came up very fast, because they seem to have been cached > cause I did that quite often. > As you can see the last 4 results weren't cached and it took a huge amount > of time to do the iteration over the relationships. > > I checked that with the following code: > > for (Node node : db.getAllNodes()) { > if (node.getId() > 0) { > long test = System.currentTimeMillis(); > Iterable rels = > node.getRelationships(knows); > System.out.print("Retrieval:" > + (System.currentTimeMillis() > - test)); > > test = System.currentTimeMillis(); > int count = > com.google.common.collect.Iterables.size(rels); > System.out.print("ms; Counting:" > + (System.currentTimeMillis() > - test)); > System.out.println("ms; number of edges:" + > count); > } > } > Is there maybe a possibilty to cache more relationships or do you have any > idea hot to speedup the iteration progress. > > Thanks for your help again! > > Cheers, > Stephan > > -- > View this message in context: > http://n
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
The "retrieval" is only virtual, as it is lazy. When I get back to my machine on Thursday, I gonna run your tests and get back to you. I have made some modifications on the relationship loading and want to see how that affects this. There are issues loading lots of relationships with cold caches in a one-by-one usecase. As the larger segment caching only kicks in if there are a certain number of misses of the memory mapped file loading. Using an SSD would also speed up your use-case. Configuring Neo4j to use more memory for memory mapping would also help. Cheers Michael Am 20.09.2011 um 17:37 schrieb st3ven: > Hello again, > > the bottle neck is at the iteration. > I did some tests with it to check whether the iteration or relationship > retrievel is to slow. > > My test results look like this: > > Retrieval:1ms; Counting:158ms; number of edges:116407 > Retrieval:0ms; Counting:2ms; number of edges:1804 > Retrieval:0ms; Counting:0ms; number of edges:22 > Retrieval:0ms; Counting:0ms; number of edges:31 > Retrieval:0ms; Counting:0ms; number of edges:39 > Retrieval:0ms; Counting:2ms; number of edges:1213 > Retrieval:0ms; Counting:0ms; number of edges:57 > Retrieval:0ms; Counting:36ms; number of edges:59420 > Retrieval:0ms; Counting:335ms; number of edges:175156 > Retrieval:1ms; Counting:168ms; number of edges:146697 > Retrieval:0ms; Counting:354ms; number of edges:285051 > Retrieval:0ms; Counting:0ms; number of edges:50 > Retrieval:0ms; Counting:11ms; number of edges:20960 > Retrieval:0ms; Counting:0ms; number of edges:43 > Retrieval:0ms; Counting:0ms; number of edges:51 > Retrieval:0ms; Counting:1ms; number of edges:647 > Retrieval:0ms; Counting:5ms; number of edges:10216 > Retrieval:0ms; Counting:2ms; number of edges:3444 > Retrieval:0ms; Counting:0ms; number of edges:1128 > Retrieval:1ms; Counting:312ms; number of edges:319127 > Retrieval:1ms; Counting:0ms; number of edges:5 > Retrieval:0ms; Counting:760ms; number of edges:104741 > Retrieval:0ms; Counting:11ms; number of edges:9210 > Retrieval:0ms; Counting:0ms; number of edges:31 > Retrieval:1ms; Counting:3ms; number of edges:3116 > Retrieval:0ms; Counting:37ms; number of edges:70835 > Retrieval:0ms; Counting:383ms; number of edges:296445 > Retrieval:1ms; Counting:0ms; number of edges:120 > Retrieval:0ms; Counting:2ms; number of edges:1526 > Retrieval:0ms; Counting:0ms; number of edges:71 > Retrieval:0ms; Counting:42ms; number of edges:35960 > Retrieval:0ms; Counting:90ms; number of edges:9644 > Retrieval:0ms; Counting:186ms; number of edges:129981 > Retrieval:0ms; Counting:1ms; number of edges:1213 > Retrieval:1ms; Counting:143ms; number of edges:124495 > Retrieval:0ms; Counting:0ms; number of edges:58 > Retrieval:0ms; Counting:75ms; number of edges:56195 > Retrieval:0ms; Counting:99ms; number of edges:92574 > Retrieval:0ms; Counting:0ms; number of edges:13 > Retrieval:0ms; Counting:50ms; number of edges:26350 > Retrieval:0ms; Counting:2ms; number of edges:1856 > Retrieval:1ms; Counting:376ms; number of edges:114166 > Retrieval:0ms; Counting:9528ms; number of edges:11956 > Retrieval:0ms; Counting:50047ms; number of edges:12645 > Retrieval:1ms; Counting:43687ms; number of edges:15025 > > The first results came up very fast, because they seem to have been cached > cause I did that quite often. > As you can see the last 4 results weren't cached and it took a huge amount > of time to do the iteration over the relationships. > > I checked that with the following code: > > for (Node node : db.getAllNodes()) { > if (node.getId() > 0) { > long test = System.currentTimeMillis(); > Iterable rels = > node.getRelationships(knows); > System.out.print("Retrieval:" > + (System.currentTimeMillis() - > test)); > > test = System.currentTimeMillis(); > int count = > com.google.common.collect.Iterables.size(rels); > System.out.print("ms; Counting:" > + (System.currentTimeMillis() - > test)); > System.out.println("ms; number of edges:" + > count); > } > } > Is there maybe a possibilty to cache more relationships or do you have any > idea hot to speedup the iteration progress. > > Thanks for your help again! > > Cheers, > Stephan > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352415.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hello again, the bottle neck is at the iteration. I did some tests with it to check whether the iteration or relationship retrievel is to slow. My test results look like this: Retrieval:1ms; Counting:158ms; number of edges:116407 Retrieval:0ms; Counting:2ms; number of edges:1804 Retrieval:0ms; Counting:0ms; number of edges:22 Retrieval:0ms; Counting:0ms; number of edges:31 Retrieval:0ms; Counting:0ms; number of edges:39 Retrieval:0ms; Counting:2ms; number of edges:1213 Retrieval:0ms; Counting:0ms; number of edges:57 Retrieval:0ms; Counting:36ms; number of edges:59420 Retrieval:0ms; Counting:335ms; number of edges:175156 Retrieval:1ms; Counting:168ms; number of edges:146697 Retrieval:0ms; Counting:354ms; number of edges:285051 Retrieval:0ms; Counting:0ms; number of edges:50 Retrieval:0ms; Counting:11ms; number of edges:20960 Retrieval:0ms; Counting:0ms; number of edges:43 Retrieval:0ms; Counting:0ms; number of edges:51 Retrieval:0ms; Counting:1ms; number of edges:647 Retrieval:0ms; Counting:5ms; number of edges:10216 Retrieval:0ms; Counting:2ms; number of edges:3444 Retrieval:0ms; Counting:0ms; number of edges:1128 Retrieval:1ms; Counting:312ms; number of edges:319127 Retrieval:1ms; Counting:0ms; number of edges:5 Retrieval:0ms; Counting:760ms; number of edges:104741 Retrieval:0ms; Counting:11ms; number of edges:9210 Retrieval:0ms; Counting:0ms; number of edges:31 Retrieval:1ms; Counting:3ms; number of edges:3116 Retrieval:0ms; Counting:37ms; number of edges:70835 Retrieval:0ms; Counting:383ms; number of edges:296445 Retrieval:1ms; Counting:0ms; number of edges:120 Retrieval:0ms; Counting:2ms; number of edges:1526 Retrieval:0ms; Counting:0ms; number of edges:71 Retrieval:0ms; Counting:42ms; number of edges:35960 Retrieval:0ms; Counting:90ms; number of edges:9644 Retrieval:0ms; Counting:186ms; number of edges:129981 Retrieval:0ms; Counting:1ms; number of edges:1213 Retrieval:1ms; Counting:143ms; number of edges:124495 Retrieval:0ms; Counting:0ms; number of edges:58 Retrieval:0ms; Counting:75ms; number of edges:56195 Retrieval:0ms; Counting:99ms; number of edges:92574 Retrieval:0ms; Counting:0ms; number of edges:13 Retrieval:0ms; Counting:50ms; number of edges:26350 Retrieval:0ms; Counting:2ms; number of edges:1856 Retrieval:1ms; Counting:376ms; number of edges:114166 Retrieval:0ms; Counting:9528ms; number of edges:11956 Retrieval:0ms; Counting:50047ms; number of edges:12645 Retrieval:1ms; Counting:43687ms; number of edges:15025 The first results came up very fast, because they seem to have been cached cause I did that quite often. As you can see the last 4 results weren't cached and it took a huge amount of time to do the iteration over the relationships. I checked that with the following code: for (Node node : db.getAllNodes()) { if (node.getId() > 0) { long test = System.currentTimeMillis(); Iterable rels = node.getRelationships(knows); System.out.print("Retrieval:" + (System.currentTimeMillis() - test)); test = System.currentTimeMillis(); int count = com.google.common.collect.Iterables.size(rels); System.out.print("ms; Counting:" + (System.currentTimeMillis() - test)); System.out.println("ms; number of edges:" + count); } } Is there maybe a possibilty to cache more relationships or do you have any idea hot to speedup the iteration progress. Thanks for your help again! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3352415.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
hi stephan i'm wondering if any difference if you could specify the relationship when counting degrees: RelationshipType knows = DynamicRelationshipType.withName("KNOWS"); Iterable rels = node.getRelationship(knows); count = com.google.common.collect.Iterables.size(rels); besides, do you know where is the bottle neck is, the node iteration or relationship retrieval? On Tue, Sep 20, 2011 at 1:38 PM, st3ven wrote: > Hi, > > I already tried these java parameters, but that didn't really speedup the > process and i already turned atime off. > As Java parameters I am using right now -d64 -server -Xms7G -Xmx14G > -XX:+UseParallelGC -XX:+UseNUMA > What I've also noticed is, that reading from the database is really slow on > my hard disk. > It just reads 1mb/s and sometimes 8mb/s, but that is really slow. My hard > disk can normally read and copy files much faster. > Also very strange is, that the workload of the hard disk is around 99% with > reading 1mb/s. > > My OS is Ubuntu Linux x64 and my file system is ext4. > > On the neo4j Wiki I found some performance guides, but these didn't really > help. > Do you know what I can do else? > > > Perfomance Guides: > http://wiki.neo4j.org/content/Linux_Performance_Guide > http://wiki.neo4j.org/content/Linux_Performance_Guide > http://wiki.neo4j.org/content/Configuration_Settings > http://wiki.neo4j.org/content/Configuration_Settings > > I also added a configurtion file, but it seems that my Java program doesn't > use all of the Ram. > > Thanks for your help! > > Cheers, > Stephan > > > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351881.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Best regards Linan Wang ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Steven, in this scenario, you are reading up the entire db, and basically have it cold. Neo4j is not optimized in itself to do full graph-scans. I see a few solutions for you: - store the number of relationships as a property on nodes and read only that. this works if the updates to your graph are not too frequent. - Store the relationships as a property in an Index (e.g. Lucene) and as the index for all entries. Thus, you are using an index for what it is good at - global operations over all documents. - use HA or just file copy to replicate the graph on several instances, and send a sharded query to all of them (e.g. count 100K node degrees on all of the instances and return the result). This query is very easy to do in a map/reduce fashion. Is that feasible? Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Sep 20, 2011 at 1:00 PM, st3ven wrote: > Peter, > > the import of the data into the graph database is not the main problem for > me. > The lookup of nodes from the index is fast enough for me. > To create the database it took me nearly half a day. > > My main problem here is getting the node degree of every node. > As I already said I am using this code to get the node degree of every node: > > for (Node node : db.getAllNodes()) { > counter = 0; > > if (node.getId() > 0) { > for (Relationship rel : > node.getRelationships()) { > counter++; > } > > System.out.println(node.getProperty("name").toString() + ": " > + counter); > } > > } > > After 3 days I only got the node degree of 8 nodes and I want to > optimize my traversal here, cause this is very slow. > What can I do to make this faster or do I have to change my code for getting > the node degree? > I only posted my import code because I thought I could maybe optimize there > something for this traversal. > > Thank you very much for your help! > > Cheers, > Stephan > > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hi, I already tried these java parameters, but that didn't really speedup the process and i already turned atime off. As Java parameters I am using right now -d64 -server -Xms7G -Xmx14G -XX:+UseParallelGC -XX:+UseNUMA What I've also noticed is, that reading from the database is really slow on my hard disk. It just reads 1mb/s and sometimes 8mb/s, but that is really slow. My hard disk can normally read and copy files much faster. Also very strange is, that the workload of the hard disk is around 99% with reading 1mb/s. My OS is Ubuntu Linux x64 and my file system is ext4. On the neo4j Wiki I found some performance guides, but these didn't really help. Do you know what I can do else? Perfomance Guides: http://wiki.neo4j.org/content/Linux_Performance_Guide http://wiki.neo4j.org/content/Linux_Performance_Guide http://wiki.neo4j.org/content/Configuration_Settings http://wiki.neo4j.org/content/Configuration_Settings I also added a configurtion file, but it seems that my Java program doesn't use all of the Ram. Thanks for your help! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351881.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
hi Stephan, have you set the -Xms, -XX:+UseNUMA, and -XX:+UseConcMarkSweepGC? they could speedup the process significantly. also, if you like, the jrockit is fast and free now. give it a try. btw, which file system you are using? have you turned off atime? On Tue, Sep 20, 2011 at 12:00 PM, st3ven wrote: > Peter, > > the import of the data into the graph database is not the main problem for > me. > The lookup of nodes from the index is fast enough for me. > To create the database it took me nearly half a day. > > My main problem here is getting the node degree of every node. > As I already said I am using this code to get the node degree of every node: > > for (Node node : db.getAllNodes()) { > counter = 0; > > if (node.getId() > 0) { > for (Relationship rel : > node.getRelationships()) { > counter++; > } > > System.out.println(node.getProperty("name").toString() + ": " > + counter); > } > > } > > After 3 days I only got the node degree of 8 nodes and I want to > optimize my traversal here, cause this is very slow. > What can I do to make this faster or do I have to change my code for getting > the node degree? > I only posted my import code because I thought I could maybe optimize there > something for this traversal. > > Thank you very much for your help! > > Cheers, > Stephan > > > -- > View this message in context: > http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html > Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. > ___ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user > -- Best regards Linan Wang ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Peter, the import of the data into the graph database is not the main problem for me. The lookup of nodes from the index is fast enough for me. To create the database it took me nearly half a day. My main problem here is getting the node degree of every node. As I already said I am using this code to get the node degree of every node: for (Node node : db.getAllNodes()) { counter = 0; if (node.getId() > 0) { for (Relationship rel : node.getRelationships()) { counter++; } System.out.println(node.getProperty("name").toString() + ": " + counter); } } After 3 days I only got the node degree of 8 nodes and I want to optimize my traversal here, cause this is very slow. What can I do to make this faster or do I have to change my code for getting the node degree? I only posted my import code because I thought I could maybe optimize there something for this traversal. Thank you very much for your help! Cheers, Stephan -- View this message in context: http://neo4j-community-discussions.438527.n3.nabble.com/Creating-a-graph-database-with-BatchInserter-and-getting-the-node-degree-of-every-node-tp3351599p3351664.html Sent from the Neo4j Community Discussions mailing list archive at Nabble.com. ___ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Steven, the most performant way to insert data with the BatchInserter is to first insert the nodes only form your node file (that should be fast). After that (or at the same time), find a way to generate the relationship file with Neo4j IDs rather than being forced to look the nodes up in indexes during relationship insertion. This is taking the bulk of time, so if you could write back to a file your node IDs, then massage the relationship text file to include node FROM and TO IDs (e.g. using Perl or Bash or Ruby) and import that one refering to these directly, that should be much faster. HTH Cheers, /peter neubauer GTalk: neubauer.peter Skype peter.neubauer Phone +46 704 106975 LinkedIn http://www.linkedin.com/in/neubauer Twitter http://twitter.com/peterneubauer http://www.neo4j.org - Your high performance graph database. http://startupbootcamp.org/ - Öresund - Innovation happens HERE. http://www.thoughtmade.com - Scandinavia's coolest Bring-a-Thing party. On Tue, Sep 20, 2011 at 12:23 PM, st3ven wrote: > Hello neo4j-comunity, > > > > I am creating a graph database for a social network. > > To create the graph database I am using the Batch Inserter. > > The Batch Inserter inserts data from 2 files into the graph database. > > > > Files: > > 1. the first file contains the Nodes I want to create (about 3.5M Nodes) > > The file looks like this: > Author 1 > Author 2 > Author 2 ... > > 2. the second file contains every Relationship between the Nodes (about 2.5 > billion Relationships) > > > This file looks like this: > Author1; Author2; timestamp > Author2; Author3; timestamp > Author1; Author3; timestamp... > > The specifications of my Computer look like this: > > > > Intel Core i7 3,4Ghz > > 16GB Ram > > Geforce GT 420 1GB > > 2TB harddrive > > > > My Code to create the graph database looks like this: > > > > package wikiOSN; > > import java.io.BufferedReader; > import java.io.FileReader; > import java.io.IOException; > import java.util.Map; > > import org.neo4j.graphdb.DynamicRelationshipType; > import org.neo4j.graphdb.index.BatchInserterIndex; > import org.neo4j.graphdb.index.BatchInserterIndexProvider; > import org.neo4j.helpers.collection.MapUtil; > import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider; > import org.neo4j.kernel.impl.batchinsert.BatchInserter; > import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl; > > public class CreateAndConnectNodes { > > public static void main(String[] args) throws IOException { > BufferedReader bf = new BufferedReader(new FileReader( > "/media/sdg1/Wikipedia/Reduced > Files/autoren-der-wikiartikel")); > BufferedReader bf2 = new BufferedReader(new FileReader( > "/media/sdg1/Wikipedia/Reduced > Files/wikipedia-output")); > CreateAndConnectNodes cacn = new CreateAndConnectNodes(); > cacn.createGraphDatabase(bf, bf2); > > } > > private long relationCounter = 0; > > private void createGraphDatabase(BufferedReader bf, BufferedReader bf2) > throws IOException { > BatchInserter inserter = new BatchInserterImpl( > "target/socialNetwork-batchinsert"); > BatchInserterIndexProvider indexProvider = new > LuceneBatchInserterIndexProvider( > inserter); > BatchInserterIndex authors = indexProvider.nodeIndex("author", > MapUtil.stringMap("type", "exact")); > authors.setCacheCapacity("name", 10); > > String zeile; > String zeile2; > > while ((zeile = bf.readLine()) != null) { > Mapproperties = > MapUtil.map("name", zeile); > long node = inserter.createNode(properties); > authors.add(node, properties); > } > bf.close(); > System.out.println("Nodes created!"); > authors.flush(); > String node = ""; > long node1 = 0; > long node2 = 0; > while ((zeile2 = bf2.readLine()) != null) { > if (relationCounter++ % 1 == 0) { > > System.out > .println("Edges already > created: " + relationCounter); > > } > String[] relation = zeile2.split("%;% "); > if (node == "") { > node = relation[0]; > if (authors.get("name", > relation[0]).getSingle() != null) { > node1 = authors.get("name", > relation[0]).getSingle(); > } else { >
[Neo4j] Creating a graph database with BatchInserter and getting the node degree of every node
Hello neo4j-comunity, I am creating a graph database for a social network. To create the graph database I am using the Batch Inserter. The Batch Inserter inserts data from 2 files into the graph database. Files: 1. the first file contains the Nodes I want to create (about 3.5M Nodes) The file looks like this: Author 1 Author 2 Author 2 ... 2. the second file contains every Relationship between the Nodes (about 2.5 billion Relationships) This file looks like this: Author1; Author2; timestamp Author2; Author3; timestamp Author1; Author3; timestamp... The specifications of my Computer look like this: Intel Core i7 3,4Ghz 16GB Ram Geforce GT 420 1GB 2TB harddrive My Code to create the graph database looks like this: package wikiOSN; import java.io.BufferedReader; import java.io.FileReader; import java.io.IOException; import java.util.Map; import org.neo4j.graphdb.DynamicRelationshipType; import org.neo4j.graphdb.index.BatchInserterIndex; import org.neo4j.graphdb.index.BatchInserterIndexProvider; import org.neo4j.helpers.collection.MapUtil; import org.neo4j.index.impl.lucene.LuceneBatchInserterIndexProvider; import org.neo4j.kernel.impl.batchinsert.BatchInserter; import org.neo4j.kernel.impl.batchinsert.BatchInserterImpl; public class CreateAndConnectNodes { public static void main(String[] args) throws IOException { BufferedReader bf = new BufferedReader(new FileReader( "/media/sdg1/Wikipedia/Reduced Files/autoren-der-wikiartikel")); BufferedReader bf2 = new BufferedReader(new FileReader( "/media/sdg1/Wikipedia/Reduced Files/wikipedia-output")); CreateAndConnectNodes cacn = new CreateAndConnectNodes(); cacn.createGraphDatabase(bf, bf2); } private long relationCounter = 0; private void createGraphDatabase(BufferedReader bf, BufferedReader bf2) throws IOException { BatchInserter inserter = new BatchInserterImpl( "target/socialNetwork-batchinsert"); BatchInserterIndexProvider indexProvider = new LuceneBatchInserterIndexProvider( inserter); BatchInserterIndex authors = indexProvider.nodeIndex("author", MapUtil.stringMap("type", "exact")); authors.setCacheCapacity("name", 10); String zeile; String zeile2; while ((zeile = bf.readLine()) != null) { Mapproperties = MapUtil.map("name", zeile); long node = inserter.createNode(properties); authors.add(node, properties); } bf.close(); System.out.println("Nodes created!"); authors.flush(); String node = ""; long node1 = 0; long node2 = 0; while ((zeile2 = bf2.readLine()) != null) { if (relationCounter++ % 1 == 0) { System.out .println("Edges already created: " + relationCounter); } String[] relation = zeile2.split("%;% "); if (node == "") { node = relation[0]; if (authors.get("name", relation[0]).getSingle() != null) { node1 = authors.get("name", relation[0]).getSingle(); } else { System.out.println("Autor 1: " + relation[0]); break; } } if (!node.equals(relation[0])) { node = relation[0]; if (authors.get("name", relation[0]).getSingle() != null) { node1 = authors.get("name", relation[0]).getSingle(); } else { System.out.println("Autor 1: " + relation[0]); break; } } if (authors.get("name", relation[1]).getSingle() != null) { node2 = authors.get("name", relation[1]).getSingle(); } else { System.out.println("Autor 2: " + relation[1]); break; } Map properties = MapUtil.map("timestamp", Long.parseLong(relation[2].trim()));