I fixed the problem and also added a cache per key for faster getNodes/getSingleNode lookup during the insert process. However the cache assumes that there's nothing in the index when the process starts (which almost always will be true) to speed things up even further.
You can control the cache size and if it should be used by overriding the (this is also documented in the Javadoc): boolean useCache() int getMaxCacheSizePerKey() methods in your LuceneIndexBatchInserterImpl instance. The new changes should be available in the maven repository within an hour. 2009/12/4 Mattias Persson <[email protected]>: > I think I found the problem... it's indexing as it should, but it > isn't reflected in getNodes/getSingleNode properly until you > flush/optimize/shutdown the index. I'll try to fix it today! > > 2009/12/3 Núria Trench <[email protected]>: >> Thank you very much for your response. >> If you need more information, you only have to send an e-mail and I will try >> to explain it better. >> >> Núria. >> >> 2009/12/3 Mattias Persson <[email protected]> >> >>> This is something I'd like to reproduce and I'll do some testing on >>> this tomorrow >>> >>> 2009/12/3 Núria Trench <[email protected]>: >>> > Hello, >>> > >>> > Last week, I decided to download your graph database core in order to use >>> > it. First, I created a new project to parse my CSV files and create a new >>> > graph database with Neo4j. This CSV files contain 150 milion edges and 20 >>> > milion nodes. >>> > >>> > When I finished to write the code which will create the graph database, I >>> > executed it and, after six hours of execution, the program crashes >>> because >>> > of a Lucene exception. The exception is related to the index merging and >>> it >>> > has the following message: >>> > "mergeFields produced an invalid result: docCount is 385282378 but fdx >>> file >>> > size is 3082259028; now aborting this merge to prevent index corruption" >>> > >>> > I have searched on the net and I found that it is a lucene bug. The >>> > libraries used for executing my project were: >>> > neo-1.0-b10 >>> > index-util-0.7 >>> > lucene-core-2.4.0 >>> > >>> > So, I decided to use a newer Lucene version. I found that you have a >>> newer >>> > index-util version so I updated the libraries: >>> > neo-1.0-b10 >>> > index-util-0.9 >>> > lucene-core-2.9.1 >>> > >>> > When I had updated those libraries, I tried to execute my project again >>> and >>> > I found that, in many occassions, it was not indexing properly. So, I >>> tried >>> > to optimize the index after every time I indexed something. This was a >>> > solution because, after that, it was indexing properly but the time >>> > execution increased a lot. >>> > >>> > I am not using transactions, instead of this, I am using the Batch >>> Inserter >>> > with the LuceneIndexBatchInserter. >>> > >>> > So, my question is: What can I do to solve this problem? If use >>> > index-util-0.7 I cannot finish the execution of creating the graph >>> database >>> > and I use index-util-0.9 I have to optimize the index in every insertion >>> and >>> > the execution never ever ends. >>> > >>> > Thank you very much in advance, >>> > >>> > Núria. >>> > _______________________________________________ >>> > Neo mailing list >>> > [email protected] >>> > https://lists.neo4j.org/mailman/listinfo/user >>> > >>> >>> >>> >>> -- >>> Mattias Persson, [[email protected]] >>> Neo Technology, www.neotechnology.com >>> _______________________________________________ >>> Neo mailing list >>> [email protected] >>> https://lists.neo4j.org/mailman/listinfo/user >>> >> _______________________________________________ >> Neo mailing list >> [email protected] >> https://lists.neo4j.org/mailman/listinfo/user >> > > > > -- > Mattias Persson, [[email protected]] > Neo Technology, www.neotechnology.com > -- Mattias Persson, [[email protected]] Neo Technology, www.neotechnology.com _______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

