Hi Mattias, Núria.

I am also running into scalability problems with the Lucene batch
inserter at much smaller numbers, 30,000 indexed nodes. I tried
calling optimize more. Increasing ulimit didn't help.

INFO] Exception in thread "main" java.lang.RuntimeException:
java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
[INFO]  at 
org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
[INFO]  at 
com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
[INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
[INFO] Caused by: java.io.FileNotFoundException:
/Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
(Too many open files)

I tried breaking up to separate batchinserter instances, and it hangs
now. Can I create more than one batch inserter per process if they run
sequentially and non-threaded?

Thanks,
Todd





On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <[email protected]> wrote:
> Hi again Mattias,
>
> I have tried to execute my application with the last version available in
> the maven repository and I still have the same problem. After creating and
> indexing all the nodes, the application calls the "optimize" method and,
> then, it creates all the edges by calling the method "getNodes" in order to
> select the tail and head node of the edge, but it doesn't work because many
> nodes are not found.
>
> I have tried to create only 30 nodes and 15 edges and it works properly, but
> if I try to create a big graph (180 million edges + 20 million nodes) it
> doesn't.
>
> I have also tried to call the "optimize" method every time the application
> has been created 1 million nodes but it doesn't work.
>
> Have you tried to create as many nodes as I have said with the newer
> index-util version?
>
> Thank you,
>
> Núria.
>
> 2009/12/4 Núria Trench <[email protected]>
>
>> Hi Mattias,
>>
>> Thank you very much for fixing the problem so fast. I will try it as soon
>> as the new changes will be available in the maven repository.
>>
>> Núria.
>>
>>
>> 2009/12/4 Mattias Persson <[email protected]>
>>
>>> I fixed the problem and also added a cache per key for faster
>>> getNodes/getSingleNode lookup during the insert process. However the
>>> cache assumes that there's nothing in the index when the process
>>> starts (which almost always will be true) to speed things up even
>>> further.
>>>
>>> You can control the cache size and if it should be used by overriding
>>> the (this is also documented in the Javadoc):
>>>
>>> boolean useCache()
>>> int getMaxCacheSizePerKey()
>>>
>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
>>> should be available in the maven repository within an hour.
>>>
>>> 2009/12/4 Mattias Persson <[email protected]>:
>>> > I think I found the problem... it's indexing as it should, but it
>>> > isn't reflected in getNodes/getSingleNode properly until you
>>> > flush/optimize/shutdown the index. I'll try to fix it today!
>>> >
>>> > 2009/12/3 Núria Trench <[email protected]>:
>>> >> Thank you very much for your response.
>>> >> If you need more information, you only have to send an e-mail and I
>>> will try
>>> >> to explain it better.
>>> >>
>>> >> Núria.
>>> >>
>>> >> 2009/12/3 Mattias Persson <[email protected]>
>>> >>
>>> >>> This is something I'd like to reproduce and I'll do some testing on
>>> >>> this tomorrow
>>> >>>
>>> >>> 2009/12/3 Núria Trench <[email protected]>:
>>> >>> > Hello,
>>> >>> >
>>> >>> > Last week, I decided to download your graph database core in order
>>> to use
>>> >>> > it. First, I created a new project to parse my CSV files and create
>>> a new
>>> >>> > graph database with Neo4j. This CSV files contain 150 milion edges
>>> and 20
>>> >>> > milion nodes.
>>> >>> >
>>> >>> > When I finished to write the code which will create the graph
>>> database, I
>>> >>> > executed it and, after six hours of execution, the program crashes
>>> >>> because
>>> >>> > of a Lucene exception. The exception is related to the index merging
>>> and
>>> >>> it
>>> >>> > has the following message:
>>> >>> > "mergeFields produced an invalid result: docCount is 385282378 but
>>> fdx
>>> >>> file
>>> >>> > size is 3082259028; now aborting this merge to prevent index
>>> corruption"
>>> >>> >
>>> >>> > I have searched on the net and I found that it is a lucene bug. The
>>> >>> > libraries used for executing my project were:
>>> >>> > neo-1.0-b10
>>> >>> > index-util-0.7
>>> >>> > lucene-core-2.4.0
>>> >>> >
>>> >>> > So, I decided to use a newer Lucene version. I found that you have a
>>> >>> newer
>>> >>> > index-util version so I updated the libraries:
>>> >>> > neo-1.0-b10
>>> >>> > index-util-0.9
>>> >>> > lucene-core-2.9.1
>>> >>> >
>>> >>> > When I had updated those libraries, I tried to execute my project
>>> again
>>> >>> and
>>> >>> > I found that, in many occassions, it was not indexing properly. So,
>>> I
>>> >>> tried
>>> >>> > to optimize the index after every time I indexed something. This was
>>> a
>>> >>> > solution because, after that, it was indexing properly but the time
>>> >>> > execution increased a lot.
>>> >>> >
>>> >>> > I am not using transactions, instead of this, I am using the Batch
>>> >>> Inserter
>>> >>> > with the LuceneIndexBatchInserter.
>>> >>> >
>>> >>> > So, my question is: What can I do to solve this problem? If use
>>> >>> > index-util-0.7 I cannot finish the execution of creating the graph
>>> >>> database
>>> >>> > and I use index-util-0.9 I have to optimize the index in every
>>> insertion
>>> >>> and
>>> >>> > the execution never ever ends.
>>> >>> >
>>> >>> > Thank you very much in advance,
>>> >>> >
>>> >>> > Núria.
>>> >>> > _______________________________________________
>>> >>> > Neo mailing list
>>> >>> > [email protected]
>>> >>> > https://lists.neo4j.org/mailman/listinfo/user
>>> >>> >
>>> >>>
>>> >>>
>>> >>>
>>> >>> --
>>> >>> Mattias Persson, [[email protected]]
>>> >>> Neo Technology, www.neotechnology.com
>>> >>> _______________________________________________
>>> >>> Neo mailing list
>>> >>> [email protected]
>>> >>> https://lists.neo4j.org/mailman/listinfo/user
>>> >>>
>>> >> _______________________________________________
>>> >> Neo mailing list
>>> >> [email protected]
>>> >> https://lists.neo4j.org/mailman/listinfo/user
>>> >>
>>> >
>>> >
>>> >
>>> > --
>>> > Mattias Persson, [[email protected]]
>>> > Neo Technology, www.neotechnology.com
>>> >
>>>
>>>
>>>
>>> --
>>> Mattias Persson, [[email protected]]
>>> Neo Technology, www.neotechnology.com
>>> _______________________________________________
>>> Neo mailing list
>>> [email protected]
>>> https://lists.neo4j.org/mailman/listinfo/user
>>>
>>
>>
> _______________________________________________
> Neo mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>
_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to