Could you provide me with some sample code which can trigger this
behaviour with the latest index-util-0.9-SNAPSHOT Núria?

2009/12/9 Núria Trench <[email protected]>:
> Todd,
>
> I haven't the same problem. In my case, after indexing all the
> attributes/properties of each node, the application creates all the edges by
> looking up the tail node and the head node. So, it calls the method
> "org.neo4j.util.index.
> LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found node)
> in many occasions.
>
> Any one has an alternative to get a node with indexex attributes/properties?
>
> Thank you,
>
> Núria.
>
>
> 2009/12/7 Mattias Persson <[email protected]>
>
>> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This
>> is a bug that we fixed yesterday... (assuming it's the same bug).
>>
>> 2009/12/7 Todd Stavish <[email protected]>:
>> > Hi Mattias, Núria.
>> >
>> > I am also running into scalability problems with the Lucene batch
>> > inserter at much smaller numbers, 30,000 indexed nodes. I tried
>> > calling optimize more. Increasing ulimit didn't help.
>> >
>> > INFO] Exception in thread "main" java.lang.RuntimeException:
>> > java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186)
>> > [INFO]  at
>> org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238)
>> > [INFO]  at
>> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277)
>> > [INFO]  at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57)
>> > [INFO] Caused by: java.io.FileNotFoundException:
>> >
>> /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx
>> > (Too many open files)
>> >
>> > I tried breaking up to separate batchinserter instances, and it hangs
>> > now. Can I create more than one batch inserter per process if they run
>> > sequentially and non-threaded?
>> >
>> > Thanks,
>> > Todd
>> >
>> >
>> >
>> >
>> >
>> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <[email protected]>
>> wrote:
>> >> Hi again Mattias,
>> >>
>> >> I have tried to execute my application with the last version available
>> in
>> >> the maven repository and I still have the same problem. After creating
>> and
>> >> indexing all the nodes, the application calls the "optimize" method and,
>> >> then, it creates all the edges by calling the method "getNodes" in order
>> to
>> >> select the tail and head node of the edge, but it doesn't work because
>> many
>> >> nodes are not found.
>> >>
>> >> I have tried to create only 30 nodes and 15 edges and it works properly,
>> but
>> >> if I try to create a big graph (180 million edges + 20 million nodes) it
>> >> doesn't.
>> >>
>> >> I have also tried to call the "optimize" method every time the
>> application
>> >> has been created 1 million nodes but it doesn't work.
>> >>
>> >> Have you tried to create as many nodes as I have said with the newer
>> >> index-util version?
>> >>
>> >> Thank you,
>> >>
>> >> Núria.
>> >>
>> >> 2009/12/4 Núria Trench <[email protected]>
>> >>
>> >>> Hi Mattias,
>> >>>
>> >>> Thank you very much for fixing the problem so fast. I will try it as
>> soon
>> >>> as the new changes will be available in the maven repository.
>> >>>
>> >>> Núria.
>> >>>
>> >>>
>> >>> 2009/12/4 Mattias Persson <[email protected]>
>> >>>
>> >>>> I fixed the problem and also added a cache per key for faster
>> >>>> getNodes/getSingleNode lookup during the insert process. However the
>> >>>> cache assumes that there's nothing in the index when the process
>> >>>> starts (which almost always will be true) to speed things up even
>> >>>> further.
>> >>>>
>> >>>> You can control the cache size and if it should be used by overriding
>> >>>> the (this is also documented in the Javadoc):
>> >>>>
>> >>>> boolean useCache()
>> >>>> int getMaxCacheSizePerKey()
>> >>>>
>> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new changes
>> >>>> should be available in the maven repository within an hour.
>> >>>>
>> >>>> 2009/12/4 Mattias Persson <[email protected]>:
>> >>>> > I think I found the problem... it's indexing as it should, but it
>> >>>> > isn't reflected in getNodes/getSingleNode properly until you
>> >>>> > flush/optimize/shutdown the index. I'll try to fix it today!
>> >>>> >
>> >>>> > 2009/12/3 Núria Trench <[email protected]>:
>> >>>> >> Thank you very much for your response.
>> >>>> >> If you need more information, you only have to send an e-mail and I
>> >>>> will try
>> >>>> >> to explain it better.
>> >>>> >>
>> >>>> >> Núria.
>> >>>> >>
>> >>>> >> 2009/12/3 Mattias Persson <[email protected]>
>> >>>> >>
>> >>>> >>> This is something I'd like to reproduce and I'll do some testing
>> on
>> >>>> >>> this tomorrow
>> >>>> >>>
>> >>>> >>> 2009/12/3 Núria Trench <[email protected]>:
>> >>>> >>> > Hello,
>> >>>> >>> >
>> >>>> >>> > Last week, I decided to download your graph database core in
>> order
>> >>>> to use
>> >>>> >>> > it. First, I created a new project to parse my CSV files and
>> create
>> >>>> a new
>> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion
>> edges
>> >>>> and 20
>> >>>> >>> > milion nodes.
>> >>>> >>> >
>> >>>> >>> > When I finished to write the code which will create the graph
>> >>>> database, I
>> >>>> >>> > executed it and, after six hours of execution, the program
>> crashes
>> >>>> >>> because
>> >>>> >>> > of a Lucene exception. The exception is related to the index
>> merging
>> >>>> and
>> >>>> >>> it
>> >>>> >>> > has the following message:
>> >>>> >>> > "mergeFields produced an invalid result: docCount is 385282378
>> but
>> >>>> fdx
>> >>>> >>> file
>> >>>> >>> > size is 3082259028; now aborting this merge to prevent index
>> >>>> corruption"
>> >>>> >>> >
>> >>>> >>> > I have searched on the net and I found that it is a lucene bug.
>> The
>> >>>> >>> > libraries used for executing my project were:
>> >>>> >>> > neo-1.0-b10
>> >>>> >>> > index-util-0.7
>> >>>> >>> > lucene-core-2.4.0
>> >>>> >>> >
>> >>>> >>> > So, I decided to use a newer Lucene version. I found that you
>> have a
>> >>>> >>> newer
>> >>>> >>> > index-util version so I updated the libraries:
>> >>>> >>> > neo-1.0-b10
>> >>>> >>> > index-util-0.9
>> >>>> >>> > lucene-core-2.9.1
>> >>>> >>> >
>> >>>> >>> > When I had updated those libraries, I tried to execute my
>> project
>> >>>> again
>> >>>> >>> and
>> >>>> >>> > I found that, in many occassions, it was not indexing properly.
>> So,
>> >>>> I
>> >>>> >>> tried
>> >>>> >>> > to optimize the index after every time I indexed something. This
>> was
>> >>>> a
>> >>>> >>> > solution because, after that, it was indexing properly but the
>> time
>> >>>> >>> > execution increased a lot.
>> >>>> >>> >
>> >>>> >>> > I am not using transactions, instead of this, I am using the
>> Batch
>> >>>> >>> Inserter
>> >>>> >>> > with the LuceneIndexBatchInserter.
>> >>>> >>> >
>> >>>> >>> > So, my question is: What can I do to solve this problem? If use
>> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the
>> graph
>> >>>> >>> database
>> >>>> >>> > and I use index-util-0.9 I have to optimize the index in every
>> >>>> insertion
>> >>>> >>> and
>> >>>> >>> > the execution never ever ends.
>> >>>> >>> >
>> >>>> >>> > Thank you very much in advance,
>> >>>> >>> >
>> >>>> >>> > Núria.
>> >>>> >>> > _______________________________________________
>> >>>> >>> > Neo mailing list
>> >>>> >>> > [email protected]
>> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user
>> >>>> >>> >
>> >>>> >>>
>> >>>> >>>
>> >>>> >>>
>> >>>> >>> --
>> >>>> >>> Mattias Persson, [[email protected]]
>> >>>> >>> Neo Technology, www.neotechnology.com
>> >>>> >>> _______________________________________________
>> >>>> >>> Neo mailing list
>> >>>> >>> [email protected]
>> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user
>> >>>> >>>
>> >>>> >> _______________________________________________
>> >>>> >> Neo mailing list
>> >>>> >> [email protected]
>> >>>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>>> >>
>> >>>> >
>> >>>> >
>> >>>> >
>> >>>> > --
>> >>>> > Mattias Persson, [[email protected]]
>> >>>> > Neo Technology, www.neotechnology.com
>> >>>> >
>> >>>>
>> >>>>
>> >>>>
>> >>>> --
>> >>>> Mattias Persson, [[email protected]]
>> >>>> Neo Technology, www.neotechnology.com
>> >>>> _______________________________________________
>> >>>> Neo mailing list
>> >>>> [email protected]
>> >>>> https://lists.neo4j.org/mailman/listinfo/user
>> >>>>
>> >>>
>> >>>
>> >> _______________________________________________
>> >> Neo mailing list
>> >> [email protected]
>> >> https://lists.neo4j.org/mailman/listinfo/user
>> >>
>> > _______________________________________________
>> > Neo mailing list
>> > [email protected]
>> > https://lists.neo4j.org/mailman/listinfo/user
>> >
>>
>>
>>
>> --
>> Mattias Persson, [[email protected]]
>> Neo Technology, www.neotechnology.com
>> _______________________________________________
>> Neo mailing list
>> [email protected]
>> https://lists.neo4j.org/mailman/listinfo/user
>>
> _______________________________________________
> Neo mailing list
> [email protected]
> https://lists.neo4j.org/mailman/listinfo/user
>



-- 
Mattias Persson, [[email protected]]
Neo Technology, www.neotechnology.com
_______________________________________________
Neo mailing list
[email protected]
https://lists.neo4j.org/mailman/listinfo/user

Reply via email to