Could you provide me with some sample code which can trigger this behaviour with the latest index-util-0.9-SNAPSHOT Núria?
2009/12/9 Núria Trench <[email protected]>: > Todd, > > I haven't the same problem. In my case, after indexing all the > attributes/properties of each node, the application creates all the edges by > looking up the tail node and the head node. So, it calls the method > "org.neo4j.util.index. > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node) > in many occasions. > > Any one has an alternative to get a node with indexex attributes/properties? > > Thank you, > > Núria. > > > 2009/12/7 Mattias Persson <[email protected]> > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This >> is a bug that we fixed yesterday... (assuming it's the same bug). >> >> 2009/12/7 Todd Stavish <[email protected]>: >> > Hi Mattias, Núria. >> > >> > I am also running into scalability problems with the Lucene batch >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried >> > calling optimize more. Increasing ulimit didn't help. >> > >> > INFO] Exception in thread "main" java.lang.RuntimeException: >> > java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) >> > [INFO] at >> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) >> > [INFO] at >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) >> > [INFO] Caused by: java.io.FileNotFoundException: >> > >> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx >> > (Too many open files) >> > >> > I tried breaking up to separate batchinserter instances, and it hangs >> > now. Can I create more than one batch inserter per process if they run >> > sequentially and non-threaded? >> > >> > Thanks, >> > Todd >> > >> > >> > >> > >> > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <[email protected]> >> wrote: >> >> Hi again Mattias, >> >> >> >> I have tried to execute my application with the last version available >> in >> >> the maven repository and I still have the same problem. After creating >> and >> >> indexing all the nodes, the application calls the "optimize" method and, >> >> then, it creates all the edges by calling the method "getNodes" in order >> to >> >> select the tail and head node of the edge, but it doesn't work because >> many >> >> nodes are not found. >> >> >> >> I have tried to create only 30 nodes and 15 edges and it works properly, >> but >> >> if I try to create a big graph (180 million edges + 20 million nodes) it >> >> doesn't. >> >> >> >> I have also tried to call the "optimize" method every time the >> application >> >> has been created 1 million nodes but it doesn't work. >> >> >> >> Have you tried to create as many nodes as I have said with the newer >> >> index-util version? >> >> >> >> Thank you, >> >> >> >> Núria. >> >> >> >> 2009/12/4 Núria Trench <[email protected]> >> >> >> >>> Hi Mattias, >> >>> >> >>> Thank you very much for fixing the problem so fast. I will try it as >> soon >> >>> as the new changes will be available in the maven repository. >> >>> >> >>> Núria. >> >>> >> >>> >> >>> 2009/12/4 Mattias Persson <[email protected]> >> >>> >> >>>> I fixed the problem and also added a cache per key for faster >> >>>> getNodes/getSingleNode lookup during the insert process. However the >> >>>> cache assumes that there's nothing in the index when the process >> >>>> starts (which almost always will be true) to speed things up even >> >>>> further. >> >>>> >> >>>> You can control the cache size and if it should be used by overriding >> >>>> the (this is also documented in the Javadoc): >> >>>> >> >>>> boolean useCache() >> >>>> int getMaxCacheSizePerKey() >> >>>> >> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes >> >>>> should be available in the maven repository within an hour. >> >>>> >> >>>> 2009/12/4 Mattias Persson <[email protected]>: >> >>>> > I think I found the problem... it's indexing as it should, but it >> >>>> > isn't reflected in getNodes/getSingleNode properly until you >> >>>> > flush/optimize/shutdown the index. I'll try to fix it today! >> >>>> > >> >>>> > 2009/12/3 Núria Trench <[email protected]>: >> >>>> >> Thank you very much for your response. >> >>>> >> If you need more information, you only have to send an e-mail and I >> >>>> will try >> >>>> >> to explain it better. >> >>>> >> >> >>>> >> Núria. >> >>>> >> >> >>>> >> 2009/12/3 Mattias Persson <[email protected]> >> >>>> >> >> >>>> >>> This is something I'd like to reproduce and I'll do some testing >> on >> >>>> >>> this tomorrow >> >>>> >>> >> >>>> >>> 2009/12/3 Núria Trench <[email protected]>: >> >>>> >>> > Hello, >> >>>> >>> > >> >>>> >>> > Last week, I decided to download your graph database core in >> order >> >>>> to use >> >>>> >>> > it. First, I created a new project to parse my CSV files and >> create >> >>>> a new >> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion >> edges >> >>>> and 20 >> >>>> >>> > milion nodes. >> >>>> >>> > >> >>>> >>> > When I finished to write the code which will create the graph >> >>>> database, I >> >>>> >>> > executed it and, after six hours of execution, the program >> crashes >> >>>> >>> because >> >>>> >>> > of a Lucene exception. The exception is related to the index >> merging >> >>>> and >> >>>> >>> it >> >>>> >>> > has the following message: >> >>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 >> but >> >>>> fdx >> >>>> >>> file >> >>>> >>> > size is 3082259028; now aborting this merge to prevent index >> >>>> corruption" >> >>>> >>> > >> >>>> >>> > I have searched on the net and I found that it is a lucene bug. >> The >> >>>> >>> > libraries used for executing my project were: >> >>>> >>> > neo-1.0-b10 >> >>>> >>> > index-util-0.7 >> >>>> >>> > lucene-core-2.4.0 >> >>>> >>> > >> >>>> >>> > So, I decided to use a newer Lucene version. I found that you >> have a >> >>>> >>> newer >> >>>> >>> > index-util version so I updated the libraries: >> >>>> >>> > neo-1.0-b10 >> >>>> >>> > index-util-0.9 >> >>>> >>> > lucene-core-2.9.1 >> >>>> >>> > >> >>>> >>> > When I had updated those libraries, I tried to execute my >> project >> >>>> again >> >>>> >>> and >> >>>> >>> > I found that, in many occassions, it was not indexing properly. >> So, >> >>>> I >> >>>> >>> tried >> >>>> >>> > to optimize the index after every time I indexed something. This >> was >> >>>> a >> >>>> >>> > solution because, after that, it was indexing properly but the >> time >> >>>> >>> > execution increased a lot. >> >>>> >>> > >> >>>> >>> > I am not using transactions, instead of this, I am using the >> Batch >> >>>> >>> Inserter >> >>>> >>> > with the LuceneIndexBatchInserter. >> >>>> >>> > >> >>>> >>> > So, my question is: What can I do to solve this problem? If use >> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the >> graph >> >>>> >>> database >> >>>> >>> > and I use index-util-0.9 I have to optimize the index in every >> >>>> insertion >> >>>> >>> and >> >>>> >>> > the execution never ever ends. >> >>>> >>> > >> >>>> >>> > Thank you very much in advance, >> >>>> >>> > >> >>>> >>> > Núria. >> >>>> >>> > _______________________________________________ >> >>>> >>> > Neo mailing list >> >>>> >>> > [email protected] >> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user >> >>>> >>> > >> >>>> >>> >> >>>> >>> >> >>>> >>> >> >>>> >>> -- >> >>>> >>> Mattias Persson, [[email protected]] >> >>>> >>> Neo Technology, www.neotechnology.com >> >>>> >>> _______________________________________________ >> >>>> >>> Neo mailing list >> >>>> >>> [email protected] >> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user >> >>>> >>> >> >>>> >> _______________________________________________ >> >>>> >> Neo mailing list >> >>>> >> [email protected] >> >>>> >> https://lists.neo4j.org/mailman/listinfo/user >> >>>> >> >> >>>> > >> >>>> > >> >>>> > >> >>>> > -- >> >>>> > Mattias Persson, [[email protected]] >> >>>> > Neo Technology, www.neotechnology.com >> >>>> > >> >>>> >> >>>> >> >>>> >> >>>> -- >> >>>> Mattias Persson, [[email protected]] >> >>>> Neo Technology, www.neotechnology.com >> >>>> _______________________________________________ >> >>>> Neo mailing list >> >>>> [email protected] >> >>>> https://lists.neo4j.org/mailman/listinfo/user >> >>>> >> >>> >> >>> >> >> _______________________________________________ >> >> Neo mailing list >> >> [email protected] >> >> https://lists.neo4j.org/mailman/listinfo/user >> >> >> > _______________________________________________ >> > Neo mailing list >> > [email protected] >> > https://lists.neo4j.org/mailman/listinfo/user >> > >> >> >> >> -- >> Mattias Persson, [[email protected]] >> Neo Technology, www.neotechnology.com >> _______________________________________________ >> Neo mailing list >> [email protected] >> https://lists.neo4j.org/mailman/listinfo/user >> > _______________________________________________ > Neo mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > -- Mattias Persson, [[email protected]] Neo Technology, www.neotechnology.com _______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

