Hi Todd, The sample code creates nodes and relationships by parsing 4 csv files. Thank you for trying to trigger this behaviour with this sample.
Núria 2009/12/9 Mattias Persson <[email protected]> > Could you provide me with some sample code which can trigger this > behaviour with the latest index-util-0.9-SNAPSHOT Núria? > > 2009/12/9 Núria Trench <[email protected]>: > > Todd, > > > > I haven't the same problem. In my case, after indexing all the > > attributes/properties of each node, the application creates all the edges > by > > looking up the tail node and the head node. So, it calls the method > > "org.neo4j.util.index. > > LuceneIndexBatchInserterImpl.getSingleNode" which returns -1 (no found > node) > > in many occasions. > > > > Any one has an alternative to get a node with indexex > attributes/properties? > > > > Thank you, > > > > Núria. > > > > > > 2009/12/7 Mattias Persson <[email protected]> > > > >> Todd, are you sure you have the latest index-util 0.9-SNAPSHOT? This > >> is a bug that we fixed yesterday... (assuming it's the same bug). > >> > >> 2009/12/7 Todd Stavish <[email protected]>: > >> > Hi Mattias, Núria. > >> > > >> > I am also running into scalability problems with the Lucene batch > >> > inserter at much smaller numbers, 30,000 indexed nodes. I tried > >> > calling optimize more. Increasing ulimit didn't help. > >> > > >> > INFO] Exception in thread "main" java.lang.RuntimeException: > >> > java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getNodes(LuceneIndexBatchInserterImpl.java:186) > >> > [INFO] at > >> > org.neo4j.util.index.LuceneIndexBatchInserterImpl.getSingleNode(LuceneIndexBatchInserterImpl.java:238) > >> > [INFO] at > >> com.collectiveintelligence.QueryNeo.loadDataToGraph(QueryNeo.java:277) > >> > [INFO] at com.collectiveintelligence.QueryNeo.main(QueryNeo.java:57) > >> > [INFO] Caused by: java.io.FileNotFoundException: > >> > > >> > /Users/todd/Code/neo4Jprototype/target/classes/data/graph/lucene/name/_0.cfx > >> > (Too many open files) > >> > > >> > I tried breaking up to separate batchinserter instances, and it hangs > >> > now. Can I create more than one batch inserter per process if they run > >> > sequentially and non-threaded? > >> > > >> > Thanks, > >> > Todd > >> > > >> > > >> > > >> > > >> > > >> > On Mon, Dec 7, 2009 at 7:28 AM, Núria Trench <[email protected]> > >> wrote: > >> >> Hi again Mattias, > >> >> > >> >> I have tried to execute my application with the last version > available > >> in > >> >> the maven repository and I still have the same problem. After > creating > >> and > >> >> indexing all the nodes, the application calls the "optimize" method > and, > >> >> then, it creates all the edges by calling the method "getNodes" in > order > >> to > >> >> select the tail and head node of the edge, but it doesn't work > because > >> many > >> >> nodes are not found. > >> >> > >> >> I have tried to create only 30 nodes and 15 edges and it works > properly, > >> but > >> >> if I try to create a big graph (180 million edges + 20 million nodes) > it > >> >> doesn't. > >> >> > >> >> I have also tried to call the "optimize" method every time the > >> application > >> >> has been created 1 million nodes but it doesn't work. > >> >> > >> >> Have you tried to create as many nodes as I have said with the newer > >> >> index-util version? > >> >> > >> >> Thank you, > >> >> > >> >> Núria. > >> >> > >> >> 2009/12/4 Núria Trench <[email protected]> > >> >> > >> >>> Hi Mattias, > >> >>> > >> >>> Thank you very much for fixing the problem so fast. I will try it as > >> soon > >> >>> as the new changes will be available in the maven repository. > >> >>> > >> >>> Núria. > >> >>> > >> >>> > >> >>> 2009/12/4 Mattias Persson <[email protected]> > >> >>> > >> >>>> I fixed the problem and also added a cache per key for faster > >> >>>> getNodes/getSingleNode lookup during the insert process. However > the > >> >>>> cache assumes that there's nothing in the index when the process > >> >>>> starts (which almost always will be true) to speed things up even > >> >>>> further. > >> >>>> > >> >>>> You can control the cache size and if it should be used by > overriding > >> >>>> the (this is also documented in the Javadoc): > >> >>>> > >> >>>> boolean useCache() > >> >>>> int getMaxCacheSizePerKey() > >> >>>> > >> >>>> methods in your LuceneIndexBatchInserterImpl instance. The new > changes > >> >>>> should be available in the maven repository within an hour. > >> >>>> > >> >>>> 2009/12/4 Mattias Persson <[email protected]>: > >> >>>> > I think I found the problem... it's indexing as it should, but it > >> >>>> > isn't reflected in getNodes/getSingleNode properly until you > >> >>>> > flush/optimize/shutdown the index. I'll try to fix it today! > >> >>>> > > >> >>>> > 2009/12/3 Núria Trench <[email protected]>: > >> >>>> >> Thank you very much for your response. > >> >>>> >> If you need more information, you only have to send an e-mail > and I > >> >>>> will try > >> >>>> >> to explain it better. > >> >>>> >> > >> >>>> >> Núria. > >> >>>> >> > >> >>>> >> 2009/12/3 Mattias Persson <[email protected]> > >> >>>> >> > >> >>>> >>> This is something I'd like to reproduce and I'll do some > testing > >> on > >> >>>> >>> this tomorrow > >> >>>> >>> > >> >>>> >>> 2009/12/3 Núria Trench <[email protected]>: > >> >>>> >>> > Hello, > >> >>>> >>> > > >> >>>> >>> > Last week, I decided to download your graph database core in > >> order > >> >>>> to use > >> >>>> >>> > it. First, I created a new project to parse my CSV files and > >> create > >> >>>> a new > >> >>>> >>> > graph database with Neo4j. This CSV files contain 150 milion > >> edges > >> >>>> and 20 > >> >>>> >>> > milion nodes. > >> >>>> >>> > > >> >>>> >>> > When I finished to write the code which will create the graph > >> >>>> database, I > >> >>>> >>> > executed it and, after six hours of execution, the program > >> crashes > >> >>>> >>> because > >> >>>> >>> > of a Lucene exception. The exception is related to the index > >> merging > >> >>>> and > >> >>>> >>> it > >> >>>> >>> > has the following message: > >> >>>> >>> > "mergeFields produced an invalid result: docCount is > 385282378 > >> but > >> >>>> fdx > >> >>>> >>> file > >> >>>> >>> > size is 3082259028; now aborting this merge to prevent index > >> >>>> corruption" > >> >>>> >>> > > >> >>>> >>> > I have searched on the net and I found that it is a lucene > bug. > >> The > >> >>>> >>> > libraries used for executing my project were: > >> >>>> >>> > neo-1.0-b10 > >> >>>> >>> > index-util-0.7 > >> >>>> >>> > lucene-core-2.4.0 > >> >>>> >>> > > >> >>>> >>> > So, I decided to use a newer Lucene version. I found that you > >> have a > >> >>>> >>> newer > >> >>>> >>> > index-util version so I updated the libraries: > >> >>>> >>> > neo-1.0-b10 > >> >>>> >>> > index-util-0.9 > >> >>>> >>> > lucene-core-2.9.1 > >> >>>> >>> > > >> >>>> >>> > When I had updated those libraries, I tried to execute my > >> project > >> >>>> again > >> >>>> >>> and > >> >>>> >>> > I found that, in many occassions, it was not indexing > properly. > >> So, > >> >>>> I > >> >>>> >>> tried > >> >>>> >>> > to optimize the index after every time I indexed something. > This > >> was > >> >>>> a > >> >>>> >>> > solution because, after that, it was indexing properly but > the > >> time > >> >>>> >>> > execution increased a lot. > >> >>>> >>> > > >> >>>> >>> > I am not using transactions, instead of this, I am using the > >> Batch > >> >>>> >>> Inserter > >> >>>> >>> > with the LuceneIndexBatchInserter. > >> >>>> >>> > > >> >>>> >>> > So, my question is: What can I do to solve this problem? If > use > >> >>>> >>> > index-util-0.7 I cannot finish the execution of creating the > >> graph > >> >>>> >>> database > >> >>>> >>> > and I use index-util-0.9 I have to optimize the index in > every > >> >>>> insertion > >> >>>> >>> and > >> >>>> >>> > the execution never ever ends. > >> >>>> >>> > > >> >>>> >>> > Thank you very much in advance, > >> >>>> >>> > > >> >>>> >>> > Núria. > >> >>>> >>> > _______________________________________________ > >> >>>> >>> > Neo mailing list > >> >>>> >>> > [email protected] > >> >>>> >>> > https://lists.neo4j.org/mailman/listinfo/user > >> >>>> >>> > > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> > >> >>>> >>> -- > >> >>>> >>> Mattias Persson, [[email protected]] > >> >>>> >>> Neo Technology, www.neotechnology.com > >> >>>> >>> _______________________________________________ > >> >>>> >>> Neo mailing list > >> >>>> >>> [email protected] > >> >>>> >>> https://lists.neo4j.org/mailman/listinfo/user > >> >>>> >>> > >> >>>> >> _______________________________________________ > >> >>>> >> Neo mailing list > >> >>>> >> [email protected] > >> >>>> >> https://lists.neo4j.org/mailman/listinfo/user > >> >>>> >> > >> >>>> > > >> >>>> > > >> >>>> > > >> >>>> > -- > >> >>>> > Mattias Persson, [[email protected]] > >> >>>> > Neo Technology, www.neotechnology.com > >> >>>> > > >> >>>> > >> >>>> > >> >>>> > >> >>>> -- > >> >>>> Mattias Persson, [[email protected]] > >> >>>> Neo Technology, www.neotechnology.com > >> >>>> _______________________________________________ > >> >>>> Neo mailing list > >> >>>> [email protected] > >> >>>> https://lists.neo4j.org/mailman/listinfo/user > >> >>>> > >> >>> > >> >>> > >> >> _______________________________________________ > >> >> Neo mailing list > >> >> [email protected] > >> >> https://lists.neo4j.org/mailman/listinfo/user > >> >> > >> > _______________________________________________ > >> > Neo mailing list > >> > [email protected] > >> > https://lists.neo4j.org/mailman/listinfo/user > >> > > >> > >> > >> > >> -- > >> Mattias Persson, [[email protected]] > >> Neo Technology, www.neotechnology.com > >> _______________________________________________ > >> Neo mailing list > >> [email protected] > >> https://lists.neo4j.org/mailman/listinfo/user > >> > > _______________________________________________ > > Neo mailing list > > [email protected] > > https://lists.neo4j.org/mailman/listinfo/user > > > > > > -- > Mattias Persson, [[email protected]] > Neo Technology, www.neotechnology.com > _______________________________________________ > Neo mailing list > [email protected] > https://lists.neo4j.org/mailman/listinfo/user >
_______________________________________________ Neo mailing list [email protected] https://lists.neo4j.org/mailman/listinfo/user

