Massimo, could you please look into the Lucene Document instance that you add all the fields to?
If it also contains this ultralarge ArrayList with all the Fields ? And which version of lucene did you use for your standalone testing? Cheers Michael Am 27.06.2011 um 23:04 schrieb Massimo Lusetti: > On Sat, Jun 25, 2011 at 2:59 AM, Michael Hunger > <michael.hun...@neotechnology.com> wrote: > >> Massimo, >> >> when profiling this it quickly becomes apparent that the issue is within the >> lucene document. >> (org.apache.lucene.document.Document) >> >> it holds an arraylist of all its fields which amount to all the memory. >> >> It also contains several methods that walk over that list (filtering it) and >> or returning copies of that. >> >> Another issue that came up, the addtion takes longer and longer (because of >> Lucene doing a quick-sort on the fields at each flush()). >> >> So my suggestion would be to shard the indexing over several arguments and >> hide that behind a domain level API, each document should have around 50k >> entries to allow lucene to handle it gracefully. After you introduced this >> API you should perhaps consider replacing this large index with a more >> appropriate key-value store (like redis, jdbm, custom-impl - depending on >> your real use-case which you haven't revealed :) ). >> >> Cheers > > My use case is this one: I got a big series of log row which I have to > read and understand but I need to be sure to parse the log row one and > only one time, so i calculate an SHA1 hash of the log row and put it > in the index, if there's already that hash in the index I skip the log > row cause it means it has already been processsed. > > I've made a test with jdbm and is by far a lot worst then plain > Lucene. BTW If i do the same test with plain Lucene implementation is > works flawlessly without any pain, so I guess something going wired in > the way Lucene is been used by neo4j, but I'll try to follow your > suggestion. > BTW I've also teste MongoDB which is slower but seems more stable... > but my test isn't finished yet... > > Cheers > -- > Massimo > _______________________________________________ > Neo4j mailing list > User@lists.neo4j.org > https://lists.neo4j.org/mailman/listinfo/user _______________________________________________ Neo4j mailing list User@lists.neo4j.org https://lists.neo4j.org/mailman/listinfo/user