Hi Mattias, Núria. I am also running into scalability problems with the Lucene batch inserter at much smaller numbers, 30,000 indexed nodes. I tried calling optimize more. Increasing ulimit didn't help.
INFO] Exception in thread "main" java.lang.RuntimeException: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) [INFO] at org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) [INFO] at com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) [INFO] Caused by: java.io.FileNotFoundException: /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx (Too many open files) I tried breaking up to separate batchinserter instances, and it hangs now. Can I create more than one batch inserter per process if they run sequentially and non-threaded? Thanks, Todd On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <[email protected]> wrote: > Hi again Mattias, > > I have tried to execute my application with the last version available in > the maven repository and I still have the same problem. After creating and > indexing all the nodes, the application calls the "optimize" method and, > then, it creates all the edges by calling the method "getNodes" in order to > select the tail and head node of the edge, but it doesn't work because many > nodes are not found. > > I have tried to create only 30 nodes and 15 edges and it works properly, but > if I try to create a big graph (180 million edges + 20 million nodes) it > doesn't. > > I have also tried to call the "optimize" method every time the application > has been created 1 million nodes but it doesn't work. > > Have you tried to create as many nodes as I have said with the newer > index-util version? > > Thank you, > > Núria. > > 2009/12/4 Núria Trench <[email protected]> > >> Hi Mattias, >> >> Thank you very much for fixing the problem so fast. I will try it as soon >> as the new changes will be available in the maven repository. >> >> Núria. >> >> >> 2009/12/4 Mattias Persson <[email protected]> >> >>> I fixed the problem and also added a cache per key for faster >>> getNodes/getSingleNode lookup during the insert process. However the >>> cache assumes that there's nothing in the index when the process >>> starts (which almost always will be true) to speed things up even >>> further. >>> >>> You can control the cache size and if it should be used by overriding >>> the (this is also documented in the Javadoc): >>> >>> boolean useCache() >>> int getMaxCacheSizePerKey() >>> >>> methods in your LuceneIndexBatchInserterImpl instance. The new changes >>> should be available in the maven repository within an hour. >>> >>> 2009/12/4 Mattias Persson <[email protected]>: >>> > I think I found the problem... it's indexing as it should, but it >>> > isn't reflected in getNodes/getSingleNode properly until you >>> > flush/optimize/shutdown the index. I'll try to fix it today! >>> > >>> > 2009/12/3 Núria Trench <[email protected]>: >>> >> Thank you very much for your response. >>> >> If you need more information, you only have to send an e-mail and I >>> will try >>> >> to explain it better. >>> >> >>> >> Núria. >>> >> >>> >> 2009/12/3 Mattias Persson <[email protected]> >>> >> >>> >>> This is something I'd like to reproduce and I'll do some testing on >>> >>> this tomorrow >>> >>> >>> >>> 2009/12/3 Núria Trench <[email protected]>: >>> >>> > Hello, >>> >>> > >>> >>> > Last week, I decided to download your graph database core in order >>> to use >>> >>> > it. First, I created a new project to parse my CSV files and create >>> a new >>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges >>> and 20 >>> >>> > milion nodes. >>> >>> > >>> >>> > When I finished to write the code which will create the graph >>> database, I >>> >>> > executed it and, after six hours of execution, the program crashes >>> >>> because >>> >>> > of a Lucene exception. The exception is related to the index merging >>> and >>> >>> it >>> >>> > has the following message: >>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but >>> fdx >>> >>> file >>> >>> > size is 3082259028; now aborting this merge to prevent index >>> corruption" >>> >>> > >>> >>> > I have searched on the net and I found that it is a lucene bug. The >>> >>> > libraries used for executing my project were: >>> >>> > neo-1.0-b10 >>> >>> > index-util-0.7 >>> >>> > lucene-core-2.4.0 >>> >>> > >>> >>> > So, I decided to use a newer Lucene version. I found that you have a >>> >>> newer >>> >>> > index-util version so I updated the libraries: >>> >>> > neo-1.0-b10 >>> >>> > index-util-0.9 >>> >>> > lucene-core-2.9.1 >>> >>> > >>> >>> > When I had updated those libraries, I tried to execute my project >>> again >>> >>> and >>> >>> > I found that, in many occassions, it was not indexing properly. So, >>> I >>> >>> tried >>> >>> > to optimize the index after every time I indexed something. This was >>> a >>> >>> > solution because, after that, it was indexing properly but the time >>> >>> > execution increased a lot. >>> >>> > >>> >>> > I am not using transactions, instead of this, I am using the Batch >>> >>> Inserter >>> >>> > with the LuceneIndexBatchInserter. >>> >>> > >>> >>> > So, my question is: What can I do to solve this problem? If use >>> >>> > index-util-0.7 I cannot finish the execution of creating the graph >>> >>> database >>> >>> > and I use index-util-0.9 I have to optimize the index in every >>> insertion >>> >>> and >>> >>> > the execution never ever ends. >>> >>> > >>> >>> > Thank you very much in advance, >>> >>> > >>> >>> > Núria. >>> >>> > _______________________________________________ >>> >>> > Neo mailing list >>> >>> > [email protected] >>> >>> > https://lists.neo4j.org/mailman/listinfo/user >>> >>> > >>> >>> >>> >>> >>> >>> >>> >>> -- >>> >>> Mattias Persson, [[email protected]] >>> >>> Neo Technology, www.neotechnology.com >>> >>> _______________________________________________ >>> >>> Neo mailing list >>> >>> [email protected] >>> >>> https://lists.neo4j.org/mailman/listinfo/user >>> >>> >>> >> _______________________________________________ >>> >> Neo mailing list >>> >> [email protected] >>> >> https://lists.neo4j.org/mailman/listinfo/user >>> >> >>> > >>> > >>> > >>> > -- >>> > Mattias Persson, [[email protected]] >>> > Neo Technology, www.neotechnology.com >>> > >>> >>> >>> >>> -- >>> Mattias Persson, [[email protected]] >>> Neo Technology, www.neotechnology.com >>> _______________________________________________ >>> Neo mailing list >>> [email protected] >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> >> > _______________________________________________ > Neo mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user > _______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

