Re: [Neo] Counting nodes

2009-12-16 Thread Mattias Persson
I'm bringing this up on the user list as well.

Let's see if I understand this correctly. So you're calling getNodes()
and you loop through the IndexHits result you get back as well as
checking its size() method right?

So for each getNodes(), are there many hits (over 1000) or something?
There's currently a need for optimization there since the lucene
result is looped through and converted into nodes before the
getNodes() method returns. This is be able to allow for decent
caching, see http://wiki.neo4j.org/content/Indexing_with_IndexService#Caching
for more info there. And I do have a ticket for making getNodes()
return immediately and load nodes lazily if the result is over a
certain size threshold.
  This fix will only help performance if you're only using the
IndexHits.size() method and if you're not looping through the entire
result, if you're looping through the entire result the performance
will be the same anyways.

Would that help you?


2009/12/15 Caunt, Matthew matthew.ca...@atosorigin.com:
 Mattias (re post below to discussion list on counting nodes)

 There is a need for optimisation in getNodes()
 and searchForNodes() beyond the current size() possibly within neo4j as
 well as
 the use of lucene.

 The simple code I've been using loops looking for nodes worthy of
 interest.
 Within the for loop the code examines a node and its properties one at a
 time
 - calls getNodes()
 - then checks getNodes.size()

 The code will only examine nodes at about 45,000 nodes per hour which
 will
 take days to explore a nodestore of 100,000,000 nodes

 Looking with a profiler after the code has been running for a few hours
 shows the following as the top CPU Method % hotspots

 27% org.neo4j.impl.util.ArrayMap.get()
 24% org.neo4j.util.index.LuceneIndexService.searchForNodes()
 16% org.neo4j.util.index.LuceneIndexService.getNodes()
 9%  org.neo4j.impl.util.ArrayMap.synchronousGet()

 The profiler shows that the steady state heap
 size is around 1.8Gb, but that every few seconds something is creating
 and then garbage collecting 1.5Gb of objects?

 Any thoughts on where the unnecessary elapsed time and CPU could be
 being used?

 Looking at the java source for getNodes() one could call
 Lucene's hits.length() immediately after creation of the first
 IndexSearcher searcher, without executing all the subsequent
 neo4j code in getNodes() and SearchForNodes()?

 Hopefully this a helpful suggestion or is there an even better way of
 speeding up the code in getNodes() and searchForNodes() both for
 - size()
 and
 - for actually finding the nodes associated with just the first n
 LuceneIndex matches?
  (as opposed to returning all the nodes in very large objects)

 Kind regards

 Matthew

 

 [Neo] Counting nodes
 Mattias Persson mattias at neotechnology.com
 Fri Dec 11 11:40:06 CET 2009

 
 

 I saw this old thread and could just fill in with more information.

 So, the IndexService.getNodes() now returns an IndexHits result, which
 is an Iterable with a size() method on it. The size is given back from
 lucene so there's no overhead in calling size() at all.

 The LuceneIndexService will create a new document for each call to
 LuceneIndexService.index() method. This is because it'd be quite slow
 to try to merge with potential existing matches a.s.o.

 2009/9/18 Andreas Kollegger akollegger at tembopublic.org:

 For my use cases at least, total node counts are needed so often that
 I'd love the optimized version. What's the relationship between
 number of documents and nodes? Would that be all indexed nodes
 regardless of the key?

 On Sep 18, 2009, at 3:46 AM, Mattias Persson wrote:

 Looking at the Lucene javadocs I can see that you can ask an index
 (IndexReader) the number of documents there are in it and it'd be
 simple to expose a size() method in the IndexService interface or
 perhaps on the LuceneIndexService class...

 Would that be something worth/good to implement?

 2009/9/18 Andreas Guenther andreas.guenther at web.de:
 actually, disregard my suggestion as I didn't read the word index
 in your question before.

 -AndreasAndreas Guenther wrote:

 The service API has a getAllNodes() call. Iterate and count through
 it
 -AndreasAndreas Kollegger wrote:

 Hi all,

 For nodes that are kept in the indexed service, is there a more
 clever
 (and hopefully efficient) way to get the node count than manually
 iterating over all the nodes in the index and actually counting. I
 don't see anything obvious in the IndexService() interface, and am
 not
 sure if I've overlooked something useful somewhere else.
 I suppose I could always keep a running tally somewhere, but then
 I'd
 have to be careful to keep it in sync with reality. Ideally the
 count
 would be closer to the metal.

 Thanks,

 Andreas
 ___


 --
 Mattias Persson

Re: [Neo] Counting nodes

2009-12-11 Thread Mattias Persson
I saw this old thread and could just fill in with more information.

So, the IndexService.getNodes() now returns an IndexHits result, which
is an Iterable with a size() method on it. The size is given back from
lucene so there's no overhead in calling size() at all.

The LuceneIndexService will create a new document for each call to
LuceneIndexService.index() method. This is because it'd be quite slow
to try to merge with potential existing matches a.s.o.

2009/9/18 Andreas Kollegger akolleg...@tembopublic.org:
 For my use cases at least, total node counts are needed so often that
 I'd love the optimized version. What's the relationship between
 number of documents and nodes? Would that be all indexed nodes
 regardless of the key?

 On Sep 18, 2009, at 3:46 AM, Mattias Persson wrote:

 Looking at the Lucene javadocs I can see that you can ask an index
 (IndexReader) the number of documents there are in it and it'd be
 simple to expose a size() method in the IndexService interface or
 perhaps on the LuceneIndexService class...

 Would that be something worth/good to implement?

 2009/9/18 Andreas Guenther andreas.guent...@web.de:
 actually, disregard my suggestion as I didn't read the word index
 in your question before.

 -AndreasAndreas Guenther wrote:

 The service API has a getAllNodes() call. Iterate and count through
 it.



 -AndreasAndreas Kollegger wrote:



 Hi all,







 For nodes that are kept in the indexed service, is there a more
 clever



 (and hopefully efficient) way to get the node count than manually



 iterating over all the nodes in the index and actually counting. I



 don't see anything obvious in the IndexService() interface, and am
 not



 sure if I've overlooked something useful somewhere else.







 I suppose I could always keep a running tally somewhere, but then I'd



 have to be careful to keep it in sync with reality. Ideally the count



 would be closer to the metal.







 Thanks,



 Andreas



 ___



 Neo mailing list



 User@lists.neo4j.org



 https://lists.neo4j.org/mailman/listinfo/user





 ___

 Neo mailing list

 User@lists.neo4j.org

 https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




 --
 Mattias Persson, [matt...@neotechnology.com]
 Neo Technology, www.neotechnology.com
 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user

 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Counting nodes

2009-09-18 Thread Mattias Persson
Looking at the Lucene javadocs I can see that you can ask an index
(IndexReader) the number of documents there are in it and it'd be
simple to expose a size() method in the IndexService interface or
perhaps on the LuceneIndexService class...

Would that be something worth/good to implement?

2009/9/18 Andreas Guenther andreas.guent...@web.de:
 actually, disregard my suggestion as I didn't read the word index in your 
 question before.

 -AndreasAndreas Guenther wrote:

 The service API has a getAllNodes() call. Iterate and count through it.



 -AndreasAndreas Kollegger wrote:



 Hi all,







 For nodes that are kept in the indexed service, is there a more clever



 (and hopefully efficient) way to get the node count than manually



 iterating over all the nodes in the index and actually counting. I



 don't see anything obvious in the IndexService() interface, and am not



 sure if I've overlooked something useful somewhere else.







 I suppose I could always keep a running tally somewhere, but then I'd



 have to be careful to keep it in sync with reality. Ideally the count



 would be closer to the metal.







 Thanks,



 Andreas



 ___



 Neo mailing list



 User@lists.neo4j.org



 https://lists.neo4j.org/mailman/listinfo/user





 ___

 Neo mailing list

 User@lists.neo4j.org

 https://lists.neo4j.org/mailman/listinfo/user


 ___
 Neo mailing list
 User@lists.neo4j.org
 https://lists.neo4j.org/mailman/listinfo/user




-- 
Mattias Persson, [matt...@neotechnology.com]
Neo Technology, www.neotechnology.com
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


[Neo] Counting nodes

2009-09-17 Thread Andreas Kollegger
Hi all,

For nodes that are kept in the indexed service, is there a more clever  
(and hopefully efficient) way to get the node count than manually  
iterating over all the nodes in the index and actually counting. I  
don't see anything obvious in the IndexService() interface, and am not  
sure if I've overlooked something useful somewhere else.

I suppose I could always keep a running tally somewhere, but then I'd  
have to be careful to keep it in sync with reality. Ideally the count  
would be closer to the metal.

Thanks,
Andreas
___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Counting nodes

2009-09-17 Thread Andreas Guenther
The service API has a getAllNodes() call. Iterate and count through it.

-AndreasAndreas Kollegger wrote:

Hi all,



For nodes that are kept in the indexed service, is there a more clever  

(and hopefully efficient) way to get the node count than manually  

iterating over all the nodes in the index and actually counting. I  

don't see anything obvious in the IndexService() interface, and am not  

sure if I've overlooked something useful somewhere else.



I suppose I could always keep a running tally somewhere, but then I'd  

have to be careful to keep it in sync with reality. Ideally the count  

would be closer to the metal.



Thanks,

Andreas

___

Neo mailing list

User@lists.neo4j.org

https://lists.neo4j.org/mailman/listinfo/user


___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user


Re: [Neo] Counting nodes

2009-09-17 Thread Andreas Guenther
actually, disregard my suggestion as I didn't read the word index in your 
question before.

-AndreasAndreas Guenther wrote:

The service API has a getAllNodes() call. Iterate and count through it.



-AndreasAndreas Kollegger wrote:



Hi all,







For nodes that are kept in the indexed service, is there a more clever  



(and hopefully efficient) way to get the node count than manually  



iterating over all the nodes in the index and actually counting. I  



don't see anything obvious in the IndexService() interface, and am not  



sure if I've overlooked something useful somewhere else.







I suppose I could always keep a running tally somewhere, but then I'd  



have to be careful to keep it in sync with reality. Ideally the count  



would be closer to the metal.







Thanks,



Andreas



___



Neo mailing list



User@lists.neo4j.org



https://lists.neo4j.org/mailman/listinfo/user





___

Neo mailing list

User@lists.neo4j.org

https://lists.neo4j.org/mailman/listinfo/user


___
Neo mailing list
User@lists.neo4j.org
https://lists.neo4j.org/mailman/listinfo/user