Re: [Neo] Counting nodes
I'm bringing this up on the user list as well. Let's see if I understand this correctly. So you're calling getNodes() and you loop through the IndexHits result you get back as well as checking its size() method right? So for each getNodes(), are there many hits (over 1000) or something? There's currently a need for optimization there since the lucene result is looped through and converted into nodes before the getNodes() method returns. This is be able to allow for decent caching, see http://wiki.neo4j.org/content/Indexing_with_IndexService#Caching for more info there. And I do have a ticket for making getNodes() return immediately and load nodes lazily if the result is over a certain size threshold. This fix will only help performance if you're only using the IndexHits.size() method and if you're not looping through the entire result, if you're looping through the entire result the performance will be the same anyways. Would that help you? 2009/12/15 Caunt, Matthew matthew.ca...@atosorigin.com: Mattias (re post below to discussion list on counting nodes) There is a need for optimisation in getNodes() and searchForNodes() beyond the current size() possibly within neo4j as well as the use of lucene. The simple code I've been using loops looking for nodes worthy of interest. Within the for loop the code examines a node and its properties one at a time - calls getNodes() - then checks getNodes.size() The code will only examine nodes at about 45,000 nodes per hour which will take days to explore a nodestore of 100,000,000 nodes Looking with a profiler after the code has been running for a few hours shows the following as the top CPU Method % hotspots 27% org.neo4j.impl.util.ArrayMap.get() 24% org.neo4j.util.index.LuceneIndexService.searchForNodes() 16% org.neo4j.util.index.LuceneIndexService.getNodes() 9% org.neo4j.impl.util.ArrayMap.synchronousGet() The profiler shows that the steady state heap size is around 1.8Gb, but that every few seconds something is creating and then garbage collecting 1.5Gb of objects? Any thoughts on where the unnecessary elapsed time and CPU could be being used? Looking at the java source for getNodes() one could call Lucene's hits.length() immediately after creation of the first IndexSearcher searcher, without executing all the subsequent neo4j code in getNodes() and SearchForNodes()? Hopefully this a helpful suggestion or is there an even better way of speeding up the code in getNodes() and searchForNodes() both for - size() and - for actually finding the nodes associated with just the first n LuceneIndex matches? (as opposed to returning all the nodes in very large objects) Kind regards Matthew [Neo] Counting nodes Mattias Persson mattias at neotechnology.com Fri Dec 11 11:40:06 CET 2009 I saw this old thread and could just fill in with more information. So, the IndexService.getNodes() now returns an IndexHits result, which is an Iterable with a size() method on it. The size is given back from lucene so there's no overhead in calling size() at all. The LuceneIndexService will create a new document for each call to LuceneIndexService.index() method. This is because it'd be quite slow to try to merge with potential existing matches a.s.o. 2009/9/18 Andreas Kollegger akollegger at tembopublic.org: For my use cases at least, total node counts are needed so often that I'd love the optimized version. What's the relationship between number of documents and nodes? Would that be all indexed nodes regardless of the key? On Sep 18, 2009, at 3:46 AM, Mattias Persson wrote: Looking at the Lucene javadocs I can see that you can ask an index (IndexReader) the number of documents there are in it and it'd be simple to expose a size() method in the IndexService interface or perhaps on the LuceneIndexService class... Would that be something worth/good to implement? 2009/9/18 Andreas Guenther andreas.guenther at web.de: actually, disregard my suggestion as I didn't read the word index in your question before. -AndreasAndreas Guenther wrote: The service API has a getAllNodes() call. Iterate and count through it -AndreasAndreas Kollegger wrote: Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ -- Mattias Persson
Re: [Neo] Counting nodes
I saw this old thread and could just fill in with more information. So, the IndexService.getNodes() now returns an IndexHits result, which is an Iterable with a size() method on it. The size is given back from lucene so there's no overhead in calling size() at all. The LuceneIndexService will create a new document for each call to LuceneIndexService.index() method. This is because it'd be quite slow to try to merge with potential existing matches a.s.o. 2009/9/18 Andreas Kollegger akolleg...@tembopublic.org: For my use cases at least, total node counts are needed so often that I'd love the optimized version. What's the relationship between number of documents and nodes? Would that be all indexed nodes regardless of the key? On Sep 18, 2009, at 3:46 AM, Mattias Persson wrote: Looking at the Lucene javadocs I can see that you can ask an index (IndexReader) the number of documents there are in it and it'd be simple to expose a size() method in the IndexService interface or perhaps on the LuceneIndexService class... Would that be something worth/good to implement? 2009/9/18 Andreas Guenther andreas.guent...@web.de: actually, disregard my suggestion as I didn't read the word index in your question before. -AndreasAndreas Guenther wrote: The service API has a getAllNodes() call. Iterate and count through it. -AndreasAndreas Kollegger wrote: Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Counting nodes
Looking at the Lucene javadocs I can see that you can ask an index (IndexReader) the number of documents there are in it and it'd be simple to expose a size() method in the IndexService interface or perhaps on the LuceneIndexService class... Would that be something worth/good to implement? 2009/9/18 Andreas Guenther andreas.guent...@web.de: actually, disregard my suggestion as I didn't read the word index in your question before. -AndreasAndreas Guenther wrote: The service API has a getAllNodes() call. Iterate and count through it. -AndreasAndreas Kollegger wrote: Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user -- Mattias Persson, [matt...@neotechnology.com] Neo Technology, www.neotechnology.com ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
[Neo] Counting nodes
Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Counting nodes
The service API has a getAllNodes() call. Iterate and count through it. -AndreasAndreas Kollegger wrote: Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user
Re: [Neo] Counting nodes
actually, disregard my suggestion as I didn't read the word index in your question before. -AndreasAndreas Guenther wrote: The service API has a getAllNodes() call. Iterate and count through it. -AndreasAndreas Kollegger wrote: Hi all, For nodes that are kept in the indexed service, is there a more clever (and hopefully efficient) way to get the node count than manually iterating over all the nodes in the index and actually counting. I don't see anything obvious in the IndexService() interface, and am not sure if I've overlooked something useful somewhere else. I suppose I could always keep a running tally somewhere, but then I'd have to be careful to keep it in sync with reality. Ideally the count would be closer to the metal. Thanks, Andreas ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user ___ Neo mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user